***

## title: Capabilities

# Voximplant Platform Capabilities

Voximplant Platform is a cloud communications platform for building programmable voice, video, and messaging applications using serverless call control, SDKs, and APIs.

***

## Voice AI Orchestration

Voximplant AI is a serverless runtime for **Voice AI pipelines** that connects real-time agent/LLM systems and speech engines to **PSTN / SIP / WebRTC / mobile / WhatsApp calling**, with code-driven orchestration and provider flexibility. (See: [Voximplant AI](https://voximplant.ai/) and the Voximplant docs [Voice AI section](https://voximplant.com/docs/voice-ai).)

### Supported vendors (direct agent / real-time LLM connectors)

Native/direct connectivity is positioned for:

* **OpenAI** (Realtime / agent-style integrations) — [Docs: OpenAI](https://voximplant.com/docs/voice-ai/openai)
* **Google Gemini (Live)** — [Docs: Google](https://voximplant.com/docs/voice-ai/google)
* **Deepgram Voice Agent** — [Docs: Deepgram](https://voximplant.com/docs/voice-ai/deepgram)
* **ElevenLabs Agents / Conversational AI** — [Docs: ElevenLabs](https://voximplant.com/docs/voice-ai/elevenlabs)
* **Ultravox (WebSocket API)** — [Docs: Ultravox](https://voximplant.com/docs/voice-ai/ultravox)
* **Cartesia Line Agents** — [Docs: Cartesia Line Agents](https://voximplant.com/products/cartesia-agents-client)
* **xAI (Grok Voice Agent)** — [Docs: xAI](https://voximplant.com/docs/voice-ai/xai)

Voximplant AI also explicitly supports connecting to **another WebSocket interface** (for other real-time AI systems) in addition to the vendors above.

### Supported vendors (speech engines: STT / TTS)

Voximplant’s platform speech layer (STT/TTS) includes built-in providers such as:

* **Speech-to-Text (STT)**: Google Speech Cloud, Microsoft Azure STT, Amazon Transcribe, Yandex Speech Cloud
* **Text-to-Speech (TTS)**: Google Speech Cloud, Amazon Polly, Yandex Speech Cloud, Microsoft Azure TTS, Tinkoff VoiceKit

For realtime / streaming TTS used in Voice AI scenarios, Voximplant also provides native VoxEngine modules and guides for:

* **Cartesia Realtime TTS** — [Guide: Realtime TTS](https://voximplant.com/docs/guides/speech/realtime-tts) and [API refs](https://voximplant.com/docs/references/voxengine/cartesia)
* **Inworld Realtime TTS** — [Guide: Realtime TTS](https://voximplant.com/docs/guides/speech/realtime-tts)
* **ElevenLabs Streaming / realtime TTS** — [Guide: ElevenLabs TTS](https://voximplant.com/docs/guides/speech/elevenlabs-tts) and [API refs](https://voximplant.com/docs/references/voxengine/elevenlabs)

### Pipeline options (architectures you can run)

* **Speech-to-speech**: real-time audio in ↔ real-time audio out (agent API handles full duplex loop)
* **Speech → LLM → TTS**: stream audio directly into a speech LLM and use a different TTS for output
* **STT → LLM → TTS**: stream audio to STT, pass text to an LLM/toolchain, synthesize response audio
* **Hybrid**: combine a real-time agent API for turn-taking with separate best-of-breed STT/TTS components (“mix & match”)

### Orchestration primitives (what you control)

* **Mix & match providers**: swap STT/TTS/LLM vendors without changing your telephony integration
* **Parallel model execution**: run multiple speech/LLM components in parallel when useful (e.g., intent extraction + generation)
* **Failover paths**: fall back to alternate speech/LLM providers when a step errors or times out
* **Wideband audio**: higher fidelity audio path for improved user experience and model comprehension
* **Deep SIP support**: SIP trunking + registration interop so agents can operate inside PBX/SBC/carrier environments
* **Channel portability**: reuse the same AI pipeline across PSTN numbers, SIP, WebRTC, mobile SDKs, and WhatsApp calling

### Real-time media integration (streaming)

* **WebSocket-based media streaming** for connecting calls to real-time AI systems and custom pipelines (audio + metadata/control messages on the same channel)
* **Media gateway abstraction**: avoid building/operating custom streaming gateways when using native connectors/modules

## Voice telephony

### Connectivity and endpoints

* **PSTN calling** (inbound/outbound) via phone numbers and programmable call handling
* **Phone numbers API**: automated procurement in **60+ countries** (availability varies by country)
* **SIP calling and trunking**: connect carriers / PBXs / SBCs using SIP interop (including registration-based scenarios)
* **WebRTC calling** via web/mobile SDKs (VoIP calling in apps and browsers)
* **WhatsApp calling**: inbound/outbound voice calls via WhatsApp Business API integration

### Serverless call control (VoxEngine)

* **JavaScript call logic (no XML)** for real-time call routing and application workflows
* **Per-call-leg signaling/media control** - granular control over each leg independently

### Conferencing and bridging

* **Single conferencing API** for voice/video; **mix PSTN, SIP, WebRTC, and native mobile endpoints**
* **Conferences up to 50 participants**

### Recording, transcription, and speech processing

* **Call recording** via `call.record()` in scenarios (supports stereo and additional options)
* **Call transcription** via `record(transcribe=true)` and retrieval via `GetCallHistory` (transcription delivered asynchronously)
* **Speaker/channel labeling** in transcripts (e.g., "Left"/"Right" labeling pattern described in docs)

### Speech-to-Text (ASR) modes and features

* **Phrase-hint mode** (best for constrained dialogs / IVRs) and **Freeform mode** (open transcription)
* **Multiple ASR engines** (e.g., Google, Amazon, Microsoft, Yandex, T-bank) with selectable profiles
* **Intermediate results** support (provider-dependent) for faster partial recognition
* **Google Speech v1p1beta1 feature passthrough** (e.g., word time offsets, punctuation, diarization config)

### Answering machine / voicemail / beep detection

* **AMD module** for voicemail/answering machine detection in scenarios
* **Beep detection** with specified frequency lists and timeouts (scenario-level control)
* **AMD event/callback model** available in VoxEngine references

### Automated outbound calling (call lists + dialing logic)

* **Call Lists**: upload a **CSV call list** and process it with VoxEngine scenarios (campaign-style calling)
* **Management API CallLists**: programmatic call-list upload/append with delimiter support
* **Predictive Dialing System (PDS)**:
  * Uses agent/load statistics and call-list progression to place calls and connect answered calls to agents
  * Supports **predictive** and **progressive** modes with tunable parameters (e.g., allowed failed call %)

***

## Video telephony

### WebRTC video API (server-based + P2P)

* **Video API** to build server-based and P2P video experiences
* SDKs abstract core WebRTC complexities:
  * **STUN/TURN/ICE**
  * **Bandwidth optimization**
  * **Video quality control**

### Real-time collaboration features

* **Screen sharing** (share screen or window)
* **Recording** for calls/conferences; storage in Voximplant Cloud or S3-compatible storage
* **Video streaming** support (platform capability referenced in docs/features)

### Voice/video interoperability

* Bridge **PSTN/SIP audio into video rooms** as part of a unified conferencing model

***

## Messaging

### SMS

* **Send SMS via Management API** and **receive inbound SMS via HTTP callbacks** (for SMS-capable numbers)

### Instant Messaging (in-app chat)

* **Direct messaging** between application users
* **Chat rooms up to 1000 participants**
* **Chatbots** for automated interactions

### Push notifications (mobile)

* Push notifications to wake devices for **incoming calls** and **message notifications**
* Android push implementation is based on **Firebase Cloud Messaging (FCM)**

### Webhooks / event delivery to your backend

* **HTTP Callbacks** for event-driven notifications without polling the Management API

***

## Tools and Developer Experience

### Cloud IDE and debugging

* **Cloud IDE + debugger** in the control panel:
  * **Code verification**
  * **Autocompletion**
  * **Diff highlighting**
  * Built-in troubleshooting workflow

### SDKs and client libraries

* SDKs: **iOS, Android, Web, React Native, Flutter, Unity**
* API clients: **curl, Node.js, Python, PHP, Go, .NET, Java**

### Management API (HTTP)

* Control accounts/services programmatically (examples from docs include managing phone numbers, messaging, billing, logs, records, user access)

***

## Real-time Media Streaming (WebSockets / Media Streams)

* **Media Streams**: integrate **live audio streams** into calls via WebSockets for real-time transcription/analysis and AI integrations
* WebSocket programming model in VoxEngine:
  * Create connections via `VoxEngine.createWebSocket(...)`
  * Stream audio using `WebSocket.sendMediaTo(...)`
  * Recommended audio chunk duration: **\~20ms**

***

## Network, Reliability, and Deployment

* **Serverless runtime** (no infra to manage for call logic)

* **Global footprint**: "datacenters in **14** distinct countries" (as stated on the platform page)

* **Status page** for live and historical uptime of subcomponents

*