Capabilities

View as MarkdownOpen in Claude

Voximplant Platform Capabilities

Voximplant Platform is a cloud communications platform for building programmable voice, video, and messaging applications using serverless call control, SDKs, and APIs.


Voice AI Orchestration

Voximplant AI is a serverless runtime for Voice AI pipelines that connects real-time agent/LLM systems and speech engines to PSTN / SIP / WebRTC / mobile / WhatsApp calling, with code-driven orchestration and provider flexibility. (See: Voximplant AI and the Voximplant docs Voice AI section.)

Supported vendors (direct agent / real-time LLM connectors)

Native/direct connectivity is positioned for:

Voximplant AI also explicitly supports connecting to another WebSocket interface (for other real-time AI systems) in addition to the vendors above.

Supported vendors (speech engines: STT / TTS)

Voximplant’s platform speech layer (STT/TTS) includes built-in providers such as:

  • Speech-to-Text (STT): Google Speech Cloud, Microsoft Azure STT, Amazon Transcribe, Yandex Speech Cloud
  • Text-to-Speech (TTS): Google Speech Cloud, Amazon Polly, Yandex Speech Cloud, Microsoft Azure TTS, Tinkoff VoiceKit

For realtime / streaming TTS used in Voice AI scenarios, Voximplant also provides native VoxEngine modules and guides for:

Pipeline options (architectures you can run)

  • Speech-to-speech: real-time audio in ↔ real-time audio out (agent API handles full duplex loop)
  • Speech → LLM → TTS: stream audio directly into a speech LLM and use a different TTS for output
  • STT → LLM → TTS: stream audio to STT, pass text to an LLM/toolchain, synthesize response audio
  • Hybrid: combine a real-time agent API for turn-taking with separate best-of-breed STT/TTS components (“mix & match”)

Orchestration primitives (what you control)

  • Mix & match providers: swap STT/TTS/LLM vendors without changing your telephony integration
  • Parallel model execution: run multiple speech/LLM components in parallel when useful (e.g., intent extraction + generation)
  • Failover paths: fall back to alternate speech/LLM providers when a step errors or times out
  • Wideband audio: higher fidelity audio path for improved user experience and model comprehension
  • Deep SIP support: SIP trunking + registration interop so agents can operate inside PBX/SBC/carrier environments
  • Channel portability: reuse the same AI pipeline across PSTN numbers, SIP, WebRTC, mobile SDKs, and WhatsApp calling

Real-time media integration (streaming)

  • WebSocket-based media streaming for connecting calls to real-time AI systems and custom pipelines (audio + metadata/control messages on the same channel)
  • Media gateway abstraction: avoid building/operating custom streaming gateways when using native connectors/modules

Voice telephony

Connectivity and endpoints

  • PSTN calling (inbound/outbound) via phone numbers and programmable call handling
  • Phone numbers API: automated procurement in 60+ countries (availability varies by country)
  • SIP calling and trunking: connect carriers / PBXs / SBCs using SIP interop (including registration-based scenarios)
  • WebRTC calling via web/mobile SDKs (VoIP calling in apps and browsers)
  • WhatsApp calling: inbound/outbound voice calls via WhatsApp Business API integration

Serverless call control (VoxEngine)

  • JavaScript call logic (no XML) for real-time call routing and application workflows
  • Per-call-leg signaling/media control - granular control over each leg independently

Conferencing and bridging

  • Single conferencing API for voice/video; mix PSTN, SIP, WebRTC, and native mobile endpoints
  • Conferences up to 50 participants

Recording, transcription, and speech processing

  • Call recording via call.record() in scenarios (supports stereo and additional options)
  • Call transcription via record(transcribe=true) and retrieval via GetCallHistory (transcription delivered asynchronously)
  • Speaker/channel labeling in transcripts (e.g., “Left”/“Right” labeling pattern described in docs)

Speech-to-Text (ASR) modes and features

  • Phrase-hint mode (best for constrained dialogs / IVRs) and Freeform mode (open transcription)
  • Multiple ASR engines (e.g., Google, Amazon, Microsoft, Yandex, T-bank) with selectable profiles
  • Intermediate results support (provider-dependent) for faster partial recognition
  • Google Speech v1p1beta1 feature passthrough (e.g., word time offsets, punctuation, diarization config)

Answering machine / voicemail / beep detection

  • AMD module for voicemail/answering machine detection in scenarios
  • Beep detection with specified frequency lists and timeouts (scenario-level control)
  • AMD event/callback model available in VoxEngine references

Automated outbound calling (call lists + dialing logic)

  • Call Lists: upload a CSV call list and process it with VoxEngine scenarios (campaign-style calling)
  • Management API CallLists: programmatic call-list upload/append with delimiter support
  • Predictive Dialing System (PDS):
    • Uses agent/load statistics and call-list progression to place calls and connect answered calls to agents
    • Supports predictive and progressive modes with tunable parameters (e.g., allowed failed call %)

Video telephony

WebRTC video API (server-based + P2P)

  • Video API to build server-based and P2P video experiences
  • SDKs abstract core WebRTC complexities:
    • STUN/TURN/ICE
    • Bandwidth optimization
    • Video quality control

Real-time collaboration features

  • Screen sharing (share screen or window)
  • Recording for calls/conferences; storage in Voximplant Cloud or S3-compatible storage
  • Video streaming support (platform capability referenced in docs/features)

Voice/video interoperability

  • Bridge PSTN/SIP audio into video rooms as part of a unified conferencing model

Messaging

SMS

  • Send SMS via Management API and receive inbound SMS via HTTP callbacks (for SMS-capable numbers)

Instant Messaging (in-app chat)

  • Direct messaging between application users
  • Chat rooms up to 1000 participants
  • Chatbots for automated interactions

Push notifications (mobile)

  • Push notifications to wake devices for incoming calls and message notifications
  • Android push implementation is based on Firebase Cloud Messaging (FCM)

Webhooks / event delivery to your backend

  • HTTP Callbacks for event-driven notifications without polling the Management API

Tools and Developer Experience

Cloud IDE and debugging

  • Cloud IDE + debugger in the control panel:
    • Code verification
    • Autocompletion
    • Diff highlighting
    • Built-in troubleshooting workflow

SDKs and client libraries

  • SDKs: iOS, Android, Web, React Native, Flutter, Unity
  • API clients: curl, Node.js, Python, PHP, Go, .NET, Java

Management API (HTTP)

  • Control accounts/services programmatically (examples from docs include managing phone numbers, messaging, billing, logs, records, user access)

Real-time Media Streaming (WebSockets / Media Streams)

  • Media Streams: integrate live audio streams into calls via WebSockets for real-time transcription/analysis and AI integrations
  • WebSocket programming model in VoxEngine:
    • Create connections via VoxEngine.createWebSocket(...)
    • Stream audio using WebSocket.sendMediaTo(...)
    • Recommended audio chunk duration: ~20ms

Network, Reliability, and Deployment

  • Serverless runtime (no infra to manage for call logic)
  • Global footprint: “datacenters in 14 distinct countries” (as stated on the platform page)
  • Status page for live and historical uptime of subcomponents

Voice AI Integrations (optional, adjacent capability)

If you need speech-to-speech agents and multi-provider orchestration, Voximplant positions Voximplant AI as a serverless runtime for Voice AI pipelines with telephony and real-time connectivity (phone numbers, SIP, mobile, WebRTC, WhatsApp).