FAQ

Common questions about Voximplant Voice AI, telephony, connectors, and pricing
View as Markdown

Getting started

Voximplant is a communications platform for developers and business teams. Voximplant AI is the Voice AI-focused part of that platform: it helps you connect modern LLMs, AI agent platforms, and speech systems to Voximplant telephony features such as phone numbers, outbound dialing, SIP, and WhatsApp.

Start with voximplant.ai or the Getting started guide.

Typical use cases include sales and support voice agents, appointment scheduling, post-call surveys, intelligent IVRs, order status lines, concierge services, and real-time copilots for human agents.

Voximplant is flexible enough to support far more than those patterns. You can deploy on phone numbers, SIP-based telephony systems, in-app voice, WebRTC, or WhatsApp Business Calling.

Start with the Getting started guide and the connector examples under Voice AI Connectors.

AI and LLM interaction

Voximplant supports these built-in Voice AI connectors:

For custom integrations, use the WebSocket media streams API.

Voximplant also supports NLU-style systems such as Dialogflow ES, Dialogflow CX, and Avatar.

VoxEngine is a serverless orchestration environment, so it supports multiple Voice AI pipeline designs.

Speech-to-speech

  • Caller audio is sent directly to the realtime LLM.
  • The realtime system returns speech, which is played back to the caller.
  • This is usually the lowest-latency option.
  • Voice choice is limited to what the realtime provider supports.

Speech-to-text-to-speech

  • Caller audio is sent to a realtime LLM that returns text.
  • VoxEngine synthesizes that text through Voximplant TTS.
  • This gives you broader speech-provider choice for output.
  • For lower latency, use streaming playback such as createRealtimeTTSPlayer with providers like ElevenLabs, Cartesia, or Inworld. See Realtime TTS.

Speech-to-text-to-text-to-speech

  • Voximplant transcribes the call using one of its ASR integrations.
  • Your VoxEngine code sends text to the LLM.
  • The LLM responds with text.
  • VoxEngine synthesizes the result and plays it back to the caller.
  • See the ChatGPT integration guide for a complete example of this style.

Yes. The VoxEngine APIs make it straightforward to switch providers or support more than one provider in the same codebase.

In practice, that often starts with swapping the client creation call:

1const voiceAIClient = await Gemini.createLiveAPIClient(geminiLiveAPIClientParameters);

to

1const voiceAIClient = await Ultravox.createWebSocketAPIClient(webSocketAPIClientParameters);

Voximplant preserves each provider’s underlying methods and events instead of flattening everything into a lowest-common-denominator API. That keeps provider-specific functionality available, although you should still expect some provider-specific adaptation when you switch.

See the connector guides for implementation details:

No. You can use VoxEngine’s HTTP request APIs and WebSocket APIs to connect to external LLM or speech systems that expose web-based interfaces.

The built-in integrations are optimized for realtime conversational use cases. If you build your own connector, you need to account for latency, media formats, and streaming behavior yourself.

For a basic custom pattern, see the ChatGPT integration guide, which uses Voximplant speech recognition, the ChatGPT completions API over HTTP, and Voximplant speech synthesis.

Speech system support

Voximplant supports many speech synthesis providers and hundreds of voice options. Current providers include Amazon, Cartesia, ElevenLabs, Google, IBM, Inworld, Microsoft, and OpenAI, among others.

See our TTS and realtime speech synthesis guides to get started.

No. The realtime LLM integrations consume audio directly and return transcription as text events over WebSockets.

While not required with our realtime LLM integrations, many Voice AI customers use Voximplant’s transcription mechanisms to help with improved accuracy, interoperability with existing systems, debugging, and compliance. Voximplant supports transcription from many different speech providers — including Amazon, Deepgram, Google and Microsoft — across hundreds of languages

See our Speech Recognition guide to get started.

Voximplant supports several synthesis patterns:

  1. Pass through speech generated by a realtime LLM or conversational AI system using one of the built-in connectors.
  2. Pass through speech generated by your own external integration over the WebSocket guide.
  3. Use call.say to speak directly into an established call.
  4. Use createTTSPlayer for more advanced text-to-speech playback and control.
  5. Play an audio file generated elsewhere with createURLPlayer.
  6. Realtime speech generation for streaming audio generation using an integration — see our Realtime speech synthesis guide for more details

Most Voice AI applications either use the speech produced by the realtime AI system itself or use realtime streaming TTS for lower latency. The other mechanisms are still useful for static prompts, menus, and hybrid IVR flows.

Telephony and calling support

Voximplant supports bidirectional audio over:

  • PSTN phone numbers
  • SIP trunks
  • SIP clients registered to a PBX
  • WebRTC in browsers
  • Native iOS and Android apps
  • WhatsApp Business Calling

See the channel guides:

Voximplant supports outbound calling to more than 230 countries and territories and offers phone numbers from more than 100 countries.

For the current country lists and rates, download the rate lists from the pricing page.

Yes. Voximplant supports video over SIP, WebRTC, and the native iOS and Android SDKs.

Learn more here: Video telephony.

Voximplant has a mature SIP stack built for interoperability with production carriers, PBXs, and custom SIP environments.

Supported patterns include:

  • SIP trunking to and from external carriers or PBX systems with allow-listing
  • SIP registration when Voximplant acts as an endpoint or softphone against an existing SIP system
  • UDP, TCP, or TLS transport
  • Encryption with SDES or DTLS keying, plus SIP over TLS for signaling and IPSec VPN on request
  • DTMF via in-band, RFC 2833, or SIP INFO
  • Custom SIP headers (X-headers)
  • Adjustable parameters such as authUser, callerId, and outbound proxy
  • Audio codec support including G.711 (mu-law/A-law), G.722, Opus, and iLBC
  • Video codec support including H.264 and VP8
  • SIP REFER call transfer so VoxEngine can be removed from the media path

See the SIP guide and the VoxEngine reference.

Yes. Voximplant supports WhatsApp Business Calling for inbound and outbound calls from an existing WhatsApp Business account. You can run Voice AI on those calls and optionally hand off to a live person or another system.

After the initial number setup, inbound WhatsApp calls are handled like other incoming calls. Outbound WhatsApp calls are started with VoxEngine.callWhatsappUser.

See the WhatsApp guide.

Pricing

Pricing, provider support, and regional telephony availability can change. Use this page for orientation, then check the Voximplant pricing page and the relevant connector guide for the latest details.

Voximplant charges separately for:

  1. Realtime LLM connectivity through the WebSocket gateway and Voice AI connectors
  2. Telephony connectivity such as phone, SIP, WhatsApp, and WebRTC
  3. Optional text-to-speech connectivity when TTS is different from the realtime AI system
  4. Optional speech-to-text transcription outside the realtime AI system itself

You are also responsible for charges from external providers whose API keys you supply.

Voximplant offers additional communications features such as recording, storage, and conferencing that may add separate charges. See the pricing page for the current full schedule.

Based on the current Voximplant AI FAQ, realtime LLM media connectivity through Voximplant connectors or the WebSocket API is billed at $0.004 per minute for a bidirectional audio stream, in 15-second increments.

There is no separate Voximplant charge for passing text to and from the LLM. You still pay the LLM provider directly for its own API usage.

Check the pricing page for the latest pricing.

Pricing varies by region and channel:

  • Phone numbers start at roughly $1 per number per month in North America and many European regions.
  • PSTN calling rates also vary by region, with entry pricing on the FAQ currently described as 0.01/minoutboundand0.01/min outbound** and **0.005/min inbound in North America and many European regions.
  • SIP, in-app calling, and WhatsApp bidirectional audio connectivity are listed in the FAQ at $0.004/min.

Always confirm the current figures on the pricing page.

Voximplant supports two TTS billing models:

Voximplant billing

  • Voximplant maintains the speech-provider contract.
  • Usage is billed to your Voximplant account at a fixed rate.
  • Pricing varies by provider and model.
  • Billing is typically measured in 10-character increments.

Bring your own API key / passthrough billing

  • You provide the API key for supported speech partners.
  • The speech provider bills you directly under its own plan.
  • Voximplant adds a gateway streaming charge. The current FAQ states this as $2 per 1 million characters, billed in 10-character increments.

See the pricing page and the individual speech-provider guides for current details.