Features Overview
For the complete documentation index, see llms.txt.
Voximplant Platform Capabilities
Voximplant Platform is a Voice AI Orchestration Platform and an established Cloud Communications Platform for building programmable voice, video, and messaging applications using serverless call control, SDKs, and APIs.
Click on any card below to see more information.
Capabilities at a glance
Connect real-time AI agents, speech systems, and telephony channels with code-driven orchestration.
Run inbound/outbound PSTN, SIP, WebRTC, and WhatsApp voice flows with fine-grained call control.
Use cloud IDE/debugging, multi-platform SDKs, and Management API automation.
Stream live call audio over WebSockets for real-time AI, transcription, and analysis pipelines.
Run globally on serverless infrastructure with multi-region coverage and uptime monitoring.
Voice AI vendors at a glance
Direct agent and real-time connectors
Realtime and agent-style voice integrations.
Live speech interactions with Gemini APIs.
WebSocket-based speech-native connector.
Native voice-agent connector and examples.
Conversational AI agent integrations.

Line Agents runtime with VoxEngine orchestration.
Grok voice-agent flow and feature support.
Speech and realtime TTS options

Realtime TTS pattern for half-cascade voice pipelines.

Realtime TTS option for half-cascade voice flows.
Streaming/realtime TTS option for voice AI pipelines.
Detailed capabilities
Voice AI Orchestration
Voximplant AI is a serverless runtime for Voice AI pipelines that connects real-time agent/LLM systems and speech engines to PSTN / SIP / WebRTC / mobile / WhatsApp calling, with code-driven orchestration and provider flexibility. See Voximplant AI and the docs Voice AI connectors section.
Supported vendors (direct agent / real-time LLM connectors)
Native/direct connectivity is positioned for:
- OpenAI (Realtime / agent-style integrations) - Docs: OpenAI
- Google Gemini (Live) - Docs: Google
- Deepgram Voice Agent - Docs: Deepgram
- ElevenLabs Agents / Conversational AI - Docs: ElevenLabs
- Ultravox (WebSocket API) - Docs: Ultravox
- Cartesia Line Agents - Docs: Cartesia Line Agents
- xAI (Grok Voice Agent) - Docs: xAI
Voximplant AI also explicitly supports connecting to another WebSocket interface (for other real-time AI systems) in addition to the vendors above.
Supported vendors (speech engines: STT / TTS)
Voximplant’s platform speech layer (STT/TTS) includes built-in providers such as:
- Speech-to-Text (STT): Google Speech Cloud, Microsoft Azure STT, Amazon Transcribe, Yandex Speech Cloud
- Text-to-Speech (TTS): Google Speech Cloud, Amazon Polly, Yandex Speech Cloud, Microsoft Azure TTS, Tinkoff VoiceKit
For realtime / streaming TTS used in Voice AI scenarios, Voximplant also provides native VoxEngine modules and guides for:
- Cartesia Realtime TTS - Guide: Realtime TTS and API refs
- Inworld Realtime TTS - Guide: Realtime TTS
- ElevenLabs Streaming / realtime TTS - Guide: ElevenLabs TTS and API refs
Pipeline options (architectures you can run)
- Speech-to-speech: real-time audio in and real-time audio out (agent API handles full duplex loop)
- Speech -> LLM -> TTS: stream audio directly into a speech LLM and use a different TTS for output
- STT -> LLM -> TTS: stream audio to STT, pass text to an LLM/toolchain, synthesize response audio
- Hybrid: combine a real-time agent API for turn-taking with separate best-of-breed STT/TTS components (mix and match)
Orchestration primitives (what you control)
- Mix and match providers: swap STT/TTS/LLM vendors without changing your telephony integration
- Parallel model execution: run multiple speech/LLM components in parallel when useful (for example, intent extraction + generation)
- Failover paths: fall back to alternate speech/LLM providers when a step errors or times out
- Wideband audio: higher fidelity audio path for improved user experience and model comprehension
- Deep SIP support: SIP trunking + registration interop so agents can operate inside PBX/SBC/carrier environments
- Channel portability: reuse the same AI pipeline across PSTN numbers, SIP, WebRTC, mobile SDKs, and WhatsApp calling
Real-time media integration (streaming)
- WebSocket-based media streaming for connecting calls to real-time AI systems and custom pipelines (audio + metadata/control messages on the same channel)
- Media gateway abstraction: avoid building/operating custom streaming gateways when using native connectors/modules
Voice telephony
Connectivity and endpoints
- PSTN calling (inbound/outbound) via phone numbers and programmable call handling
- Phone numbers API: automated procurement in 60+ countries (availability varies by country)
- SIP calling and trunking: connect carriers / PBXs / SBCs using SIP interop (including registration-based scenarios)
- WebRTC calling via web/mobile SDKs (VoIP calling in apps and browsers)
- WhatsApp calling: inbound/outbound voice calls via WhatsApp Business API integration
Serverless call control (VoxEngine)
- JavaScript call logic (no XML) for real-time call routing and application workflows
- Per-call-leg signaling/media control - granular control over each leg independently
Conferencing and bridging
- Single conferencing API for voice/video; mix PSTN, SIP, WebRTC, and native mobile endpoints
- Conferences up to 50 participants
Recording, transcription, and speech processing
- Call recording via
call.record()in scenarios (supports stereo and additional options) - Call transcription via
record(transcribe=true)and retrieval viaGetCallHistory(transcription delivered asynchronously) - Speaker/channel labeling in transcripts (for example, “Left”/“Right” labeling pattern described in docs)
Speech-to-Text (ASR) modes and features
- Phrase-hint mode (best for constrained dialogs / IVRs) and Freeform mode (open transcription)
- Multiple ASR engines (for example, Google, Amazon, Microsoft, Yandex, T-bank) with selectable profiles
- Intermediate results support (provider-dependent) for faster partial recognition
- Google Speech v1p1beta1 feature passthrough (for example, word time offsets, punctuation, diarization config)
Answering machine / voicemail / beep detection
- AMD module for voicemail/answering machine detection in scenarios
- Beep detection with specified frequency lists and timeouts (scenario-level control)
- AMD event/callback model available in VoxEngine references
Automated outbound calling (call lists + dialing logic)
- Call Lists: upload a CSV call list and process it with VoxEngine scenarios (campaign-style calling)
- Management API CallLists: programmatic call-list upload/append with delimiter support
- Predictive Dialing System (PDS): uses agent/load statistics and call-list progression to place calls and connect answered calls to agents
- Predictive and progressive dialing modes with tunable parameters (for example, allowed failed call percentage)
Tools and Developer Experience
Cloud IDE and debugging
- Cloud IDE + debugger in the control panel:
- Code verification
- Autocompletion
- Diff highlighting
- Built-in troubleshooting workflow
** Local IDE continuous integration **
- CLI tool for CI/CD automation so you can use your own IDE
- Type library for local development with autocompletion and type checking
SDKs and client libraries
- SDKs: iOS, Android, Web, React Native, Flutter, Unity
- API clients: curl, Node.js, Python, PHP, Go, .NET, Java
Management API (HTTP)
- Control accounts/services programmatically (examples from docs include managing phone numbers, messaging, billing, logs, records, and user access)
Real-time Media Streaming (WebSockets / Media Streams)
- Media Streams: integrate live audio streams into calls via WebSockets for real-time transcription/analysis and AI integrations
- WebSocket programming model in VoxEngine:
- Create connections via
VoxEngine.createWebSocket(...) - Stream audio using
WebSocket.sendMediaTo(...) - Recommended audio chunk duration: ~20ms
Network, Reliability, and Deployment
- Serverless runtime: no infrastructure to manage for call logic
- CI/CD-friendly deployment path: CI tool for automated pipelines
- Global footprint: datacenters in 14 countries
- Status page for live and historical uptime of subcomponents