Voximplant AI is a serverless runtime for **Voice AI pipelines** that connects real-time agent/LLM systems and speech engines to **PSTN / SIP / WebRTC / mobile / WhatsApp calling**, with code-driven orchestration and provider flexibility. See [Voximplant AI](https://voximplant.ai/) and the docs [Voice AI connectors section](/voice-ai-connectors/voice-ai-connectors-overview). #### Supported vendors (direct agent / real-time LLM connectors) Native/direct connectivity is positioned for: * **OpenAI** (Realtime / agent-style integrations) - [Docs: OpenAI](/voice-ai-connectors/openai-realtime-voice-ai-connector/openai-realtime-voice-ai-connector-overview) * **Google Gemini (Live)** - [Docs: Google](/voice-ai-connectors/gemini-live-voice-ai-connector/gemini-live-voice-ai-connector-overview) * **Deepgram Voice Agent** - [Docs: Deepgram](/voice-ai-connectors/deepgram-voice-ai-connector/deepgram-voice-ai-connector-overview) * **ElevenLabs Agents / Conversational AI** - [Docs: ElevenLabs](/voice-ai-connectors/elevenlabs-agents-voice-ai-connector/elevenlabs-agents-voice-ai-connector-overview) * **Ultravox (WebSocket API)** - [Docs: Ultravox](/voice-ai-connectors/ultravox-voice-ai-connector/ultravox-voice-ai-connector-overview) * **Cartesia Line Agents** - [Docs: Cartesia Line Agents](/voice-ai-connectors/cartesia-line-agents-voice-ai-connector/cartesia-line-agents-voice-ai-connector-overview) * **xAI (Grok Voice Agent)** - [Docs: xAI](/voice-ai-connectors/grok-voice-ai-connector/grok-voice-ai-connector-overview) Voximplant AI also explicitly supports connecting to **another WebSocket interface** (for other real-time AI systems) in addition to the vendors above. #### Supported vendors (speech engines: STT / TTS) Voximplant's platform speech layer (STT/TTS) includes built-in providers such as: * **Speech-to-Text (STT)**: Google Speech Cloud, Microsoft Azure STT, Amazon Transcribe, Yandex Speech Cloud * **Text-to-Speech (TTS)**: Google Speech Cloud, Amazon Polly, Yandex Speech Cloud, Microsoft Azure TTS, Tinkoff VoiceKit For realtime / streaming TTS used in Voice AI scenarios, Voximplant also provides native VoxEngine modules and guides for: * **Cartesia Realtime TTS** - [Guide: Realtime TTS](/voice-ai-connectors/openai-realtime-voice-ai-connector/openai-realtime-voice-ai-example-half-cascade-cartesia) and [API refs](https://voximplant.com/docs/references/voxengine/cartesia) * **Inworld Realtime TTS** - [Guide: Realtime TTS](/voice-ai-connectors/openai-realtime-voice-ai-connector/openai-realtime-voice-ai-example-half-cascade-inworld) * **ElevenLabs Streaming / realtime TTS** - [Guide: ElevenLabs TTS](/voice-ai-connectors/openai-realtime-voice-ai-connector/openai-realtime-voice-ai-example-half-cascade-elevenlabs) and [API refs](https://voximplant.com/docs/references/voxengine/elevenlabs) #### Pipeline options (architectures you can run) * **Speech-to-speech**: real-time audio in and real-time audio out (agent API handles full duplex loop) * **Speech -> LLM -> TTS**: stream audio directly into a speech LLM and use a different TTS for output * **STT -> LLM -> TTS**: stream audio to STT, pass text to an LLM/toolchain, synthesize response audio * **Hybrid**: combine a real-time agent API for turn-taking with separate best-of-breed STT/TTS components (mix and match) #### Orchestration primitives (what you control) * **Mix and match providers**: swap STT/TTS/LLM vendors without changing your telephony integration * **Parallel model execution**: run multiple speech/LLM components in parallel when useful (for example, intent extraction + generation) * **Failover paths**: fall back to alternate speech/LLM providers when a step errors or times out * **Wideband audio**: higher fidelity audio path for improved user experience and model comprehension * **Deep SIP support**: SIP trunking + registration interop so agents can operate inside PBX/SBC/carrier environments * **Channel portability**: reuse the same AI pipeline across PSTN numbers, SIP, WebRTC, mobile SDKs, and WhatsApp calling #### Real-time media integration (streaming) * **WebSocket-based media streaming** for connecting calls to real-time AI systems and custom pipelines (audio + metadata/control messages on the same channel) * **Media gateway abstraction**: avoid building/operating custom streaming gateways when using native connectors/modules #### Connectivity and endpoints * **PSTN calling** (inbound/outbound) via phone numbers and programmable call handling * **Phone numbers API**: automated procurement in **60+ countries** (availability varies by country) * **SIP calling and trunking**: connect carriers / PBXs / SBCs using SIP interop (including registration-based scenarios) * **WebRTC calling** via web/mobile SDKs (VoIP calling in apps and browsers) * **WhatsApp calling**: inbound/outbound voice calls via WhatsApp Business API integration #### Serverless call control (VoxEngine) * **JavaScript call logic (no XML)** for real-time call routing and application workflows * **Per-call-leg signaling/media control** - granular control over each leg independently #### Conferencing and bridging * **Single conferencing API** for voice/video; **mix PSTN, SIP, WebRTC, and native mobile endpoints** * **Conferences up to 50 participants** #### Recording, transcription, and speech processing * **Call recording** via `call.record()` in scenarios (supports stereo and additional options) * **Call transcription** via `record(transcribe=true)` and retrieval via `GetCallHistory` (transcription delivered asynchronously) * **Speaker/channel labeling** in transcripts (for example, "Left"/"Right" labeling pattern described in docs) #### Speech-to-Text (ASR) modes and features * **Phrase-hint mode** (best for constrained dialogs / IVRs) and **Freeform mode** (open transcription) * **Multiple ASR engines** (for example, Google, Amazon, Microsoft, Yandex, T-bank) with selectable profiles * **Intermediate results** support (provider-dependent) for faster partial recognition * **Google Speech v1p1beta1 feature passthrough** (for example, word time offsets, punctuation, diarization config) #### Answering machine / voicemail / beep detection * **AMD module** for voicemail/answering machine detection in scenarios * **Beep detection** with specified frequency lists and timeouts (scenario-level control) * **AMD event/callback model** available in VoxEngine references #### Automated outbound calling (call lists + dialing logic) * **Call Lists**: upload a **CSV call list** and process it with VoxEngine scenarios (campaign-style calling) * **Management API CallLists**: programmatic call-list upload/append with delimiter support * **Predictive Dialing System (PDS)**: uses agent/load statistics and call-list progression to place calls and connect answered calls to agents * **Predictive and progressive dialing modes** with tunable parameters (for example, allowed failed call percentage) #### WebRTC video API (server-based + P2P) * **Video API** to build server-based and P2P video experiences * SDKs abstract core WebRTC complexities: * **STUN/TURN/ICE** * **Bandwidth optimization** * **Video quality control** #### Real-time collaboration features * **Screen sharing** (share screen or window) * **Recording** for calls/conferences; storage in Voximplant Cloud or S3-compatible storage * **Video streaming** support (platform capability referenced in docs/features) #### Voice/video interoperability * Bridge **PSTN/SIP audio into video rooms** as part of a unified conferencing model #### SMS * **Send SMS via Management API** and **receive inbound SMS via HTTP callbacks** (for SMS-capable numbers) #### Instant Messaging (in-app chat) * **Direct messaging** between application users * **Chat rooms up to 1000 participants** * **Chatbots** for automated interactions #### Push notifications (mobile) * Push notifications to wake devices for **incoming calls** and **message notifications** * Android push implementation is based on **Firebase Cloud Messaging (FCM)** #### Webhooks / event delivery to your backend * **HTTP Callbacks** for event-driven notifications without polling the Management API #### Cloud IDE and debugging * **Cloud IDE + debugger** in the control panel: * **Code verification** * **Autocompletion** * **Diff highlighting** * Built-in troubleshooting workflow #### SDKs and client libraries * SDKs: **iOS, Android, Web, React Native, Flutter, Unity** * API clients: **curl, Node.js, Python, PHP, Go, .NET, Java** #### Management API (HTTP) * Control accounts/services programmatically (examples from docs include managing phone numbers, messaging, billing, logs, records, and user access) * **Media Streams**: integrate **live audio streams** into calls via WebSockets for real-time transcription/analysis and AI integrations * WebSocket programming model in VoxEngine: * Create connections via `VoxEngine.createWebSocket(...)` * Stream audio using `WebSocket.sendMediaTo(...)` * Recommended audio chunk duration: **\~20ms** * **Serverless runtime** (no infrastructure to manage for call logic) * **Global footprint**: datacenters in **14** distinct countries (as stated on the platform page) * **Status page** for live and historical uptime of subcomponents

Detailed capabilities