Features and Capabilities
Voximplant Platform Capabilities
Voximplant Platform is an established cloud communications platform for building programmable voice, video, and messaging applications using serverless call control, SDKs, and APIs.
Use this page as a capability map: scan the summary cards first, then expand only the detailed sections you care about.
Capabilities at a glance
Connect real-time AI agents, speech systems, and telephony channels with code-driven orchestration.
Run inbound/outbound PSTN, SIP, WebRTC, and WhatsApp voice flows with fine-grained call control.
Use cloud IDE/debugging, multi-platform SDKs, and Management API automation.
Deliver SMS, in-app messaging, push notifications, and webhook-driven backend integrations.
Build WebRTC video experiences with recording, screen sharing, and voice/video interoperability.
Stream media over WebSockets and deploy globally on serverless infrastructure.
Voice AI vendors at a glance
Direct agent and real-time connectors
Realtime and agent-style voice integrations.
Live speech interactions with Gemini APIs.
WebSocket-based speech-native connector.
Native voice-agent connector and examples.
Conversational AI agent integrations.

Line Agents runtime with VoxEngine orchestration.
Grok voice-agent flow and feature support.
Speech and realtime TTS options

Realtime TTS pattern for half-cascade voice pipelines.
API refs: Cartesia module.

Realtime TTS option for half-cascade voice flows.
Streaming/realtime TTS option for voice AI pipelines.
API refs: ElevenLabs module.
Detailed capabilities
Voice AI Orchestration
Voice AI Orchestration
Voximplant AI is a serverless runtime for Voice AI pipelines that connects real-time agent/LLM systems and speech engines to PSTN / SIP / WebRTC / mobile / WhatsApp calling, with code-driven orchestration and provider flexibility. See Voximplant AI and the docs Voice AI connectors section.
Supported vendors (direct agent / real-time LLM connectors)
Native/direct connectivity is positioned for:
- OpenAI (Realtime / agent-style integrations) - Docs: OpenAI
- Google Gemini (Live) - Docs: Google
- Deepgram Voice Agent - Docs: Deepgram
- ElevenLabs Agents / Conversational AI - Docs: ElevenLabs
- Ultravox (WebSocket API) - Docs: Ultravox
- Cartesia Line Agents - Docs: Cartesia Line Agents
- xAI (Grok Voice Agent) - Docs: xAI
Voximplant AI also explicitly supports connecting to another WebSocket interface (for other real-time AI systems) in addition to the vendors above.
Supported vendors (speech engines: STT / TTS)
Voximplant’s platform speech layer (STT/TTS) includes built-in providers such as:
- Speech-to-Text (STT): Google Speech Cloud, Microsoft Azure STT, Amazon Transcribe, Yandex Speech Cloud
- Text-to-Speech (TTS): Google Speech Cloud, Amazon Polly, Yandex Speech Cloud, Microsoft Azure TTS, Tinkoff VoiceKit
For realtime / streaming TTS used in Voice AI scenarios, Voximplant also provides native VoxEngine modules and guides for:
- Cartesia Realtime TTS - Guide: Realtime TTS and API refs
- Inworld Realtime TTS - Guide: Realtime TTS
- ElevenLabs Streaming / realtime TTS - Guide: ElevenLabs TTS and API refs
Pipeline options (architectures you can run)
- Speech-to-speech: real-time audio in and real-time audio out (agent API handles full duplex loop)
- Speech -> LLM -> TTS: stream audio directly into a speech LLM and use a different TTS for output
- STT -> LLM -> TTS: stream audio to STT, pass text to an LLM/toolchain, synthesize response audio
- Hybrid: combine a real-time agent API for turn-taking with separate best-of-breed STT/TTS components (mix and match)
Orchestration primitives (what you control)
- Mix and match providers: swap STT/TTS/LLM vendors without changing your telephony integration
- Parallel model execution: run multiple speech/LLM components in parallel when useful (for example, intent extraction + generation)
- Failover paths: fall back to alternate speech/LLM providers when a step errors or times out
- Wideband audio: higher fidelity audio path for improved user experience and model comprehension
- Deep SIP support: SIP trunking + registration interop so agents can operate inside PBX/SBC/carrier environments
- Channel portability: reuse the same AI pipeline across PSTN numbers, SIP, WebRTC, mobile SDKs, and WhatsApp calling
Real-time media integration (streaming)
- WebSocket-based media streaming for connecting calls to real-time AI systems and custom pipelines (audio + metadata/control messages on the same channel)
- Media gateway abstraction: avoid building/operating custom streaming gateways when using native connectors/modules
Voice telephony
Voice telephony
Connectivity and endpoints
- PSTN calling (inbound/outbound) via phone numbers and programmable call handling
- Phone numbers API: automated procurement in 60+ countries (availability varies by country)
- SIP calling and trunking: connect carriers / PBXs / SBCs using SIP interop (including registration-based scenarios)
- WebRTC calling via web/mobile SDKs (VoIP calling in apps and browsers)
- WhatsApp calling: inbound/outbound voice calls via WhatsApp Business API integration
Serverless call control (VoxEngine)
- JavaScript call logic (no XML) for real-time call routing and application workflows
- Per-call-leg signaling/media control - granular control over each leg independently
Conferencing and bridging
- Single conferencing API for voice/video; mix PSTN, SIP, WebRTC, and native mobile endpoints
- Conferences up to 50 participants
Recording, transcription, and speech processing
- Call recording via
call.record()in scenarios (supports stereo and additional options) - Call transcription via
record(transcribe=true)and retrieval viaGetCallHistory(transcription delivered asynchronously) - Speaker/channel labeling in transcripts (for example, “Left”/“Right” labeling pattern described in docs)
Speech-to-Text (ASR) modes and features
- Phrase-hint mode (best for constrained dialogs / IVRs) and Freeform mode (open transcription)
- Multiple ASR engines (for example, Google, Amazon, Microsoft, Yandex, T-bank) with selectable profiles
- Intermediate results support (provider-dependent) for faster partial recognition
- Google Speech v1p1beta1 feature passthrough (for example, word time offsets, punctuation, diarization config)
Answering machine / voicemail / beep detection
- AMD module for voicemail/answering machine detection in scenarios
- Beep detection with specified frequency lists and timeouts (scenario-level control)
- AMD event/callback model available in VoxEngine references
Automated outbound calling (call lists + dialing logic)
- Call Lists: upload a CSV call list and process it with VoxEngine scenarios (campaign-style calling)
- Management API CallLists: programmatic call-list upload/append with delimiter support
- Predictive Dialing System (PDS): uses agent/load statistics and call-list progression to place calls and connect answered calls to agents
- Predictive and progressive dialing modes with tunable parameters (for example, allowed failed call percentage)
Video telephony
Video telephony
WebRTC video API (server-based + P2P)
- Video API to build server-based and P2P video experiences
- SDKs abstract core WebRTC complexities:
- STUN/TURN/ICE
- Bandwidth optimization
- Video quality control
Real-time collaboration features
- Screen sharing (share screen or window)
- Recording for calls/conferences; storage in Voximplant Cloud or S3-compatible storage
- Video streaming support (platform capability referenced in docs/features)
Voice/video interoperability
- Bridge PSTN/SIP audio into video rooms as part of a unified conferencing model
Messaging
Messaging
SMS
- Send SMS via Management API and receive inbound SMS via HTTP callbacks (for SMS-capable numbers)
Instant Messaging (in-app chat)
- Direct messaging between application users
- Chat rooms up to 1000 participants
- Chatbots for automated interactions
Push notifications (mobile)
- Push notifications to wake devices for incoming calls and message notifications
- Android push implementation is based on Firebase Cloud Messaging (FCM)
Webhooks / event delivery to your backend
- HTTP Callbacks for event-driven notifications without polling the Management API
Tools and Developer Experience
Tools and Developer Experience
Cloud IDE and debugging
- Cloud IDE + debugger in the control panel:
- Code verification
- Autocompletion
- Diff highlighting
- Built-in troubleshooting workflow
SDKs and client libraries
- SDKs: iOS, Android, Web, React Native, Flutter, Unity
- API clients: curl, Node.js, Python, PHP, Go, .NET, Java
Management API (HTTP)
- Control accounts/services programmatically (examples from docs include managing phone numbers, messaging, billing, logs, records, and user access)
Real-time Media Streaming (WebSockets / Media Streams)
Real-time Media Streaming (WebSockets / Media Streams)
- Media Streams: integrate live audio streams into calls via WebSockets for real-time transcription/analysis and AI integrations
- WebSocket programming model in VoxEngine:
- Create connections via
VoxEngine.createWebSocket(...) - Stream audio using
WebSocket.sendMediaTo(...) - Recommended audio chunk duration: ~20ms
Network, Reliability, and Deployment
Network, Reliability, and Deployment
- Serverless runtime (no infrastructure to manage for call logic)
- Global footprint: datacenters in 14 distinct countries (as stated on the platform page)
- Status page for live and historical uptime of subcomponents