Overview
OpenAI and OpenAI-compatible APIs in VoxEngine
For the complete documentation index, see llms.txt.
Benefits
The native OpenAI module gives VoxEngine direct access to OpenAI’s Realtime, Responses, and Chat Completions APIs. It also works with OpenAI-compatible APIs for text-based pipelines, so you can keep the same VoxEngine integration pattern while swapping the LLM backend.
Capability and feature highlights:
- Bridge PSTN, SIP, WebRTC, or WhatsApp calls into OpenAI with a single VoxEngine scenario.
- Use the API surface that fits your architecture: Realtime, Responses API, or Chat Completions.
- Run direct realtime speech-to-speech or build half-cascade and full-cascade pipelines.
- Use OpenAI-compatible APIs for text-first pipelines through the same OpenAI module surface.
- Add barge‑in, VAD, and turn detection for natural turn-taking in cascade pipelines.
Demo video
OpenAI Realtime demo (general):
Video link: OpenAI Realtime API demo
Architecture
Prerequisites
- OpenAI account with API access at platform.openai.com.
- OpenAI API key from OpenAI API keys.
- Access to the model family you want to use in these guides (for example
gpt-realtime-1.5,gpt-4o-mini, or another compatible text model).
Supported API surfaces
The VoxEngine OpenAI module currently supports three API shapes:
- Realtime API for direct speech-to-speech sessions with native input audio, output audio, server-side VAD, and low-latency streaming.
- Responses API client for text-first and cascade pipelines, including OpenAI-compatible backends exposed through a custom
baseUrl. - Chat Completions API client for simpler request/response or streaming text workflows when you do not need the newer Responses API surface.
Pipeline options
Direct realtime
Use the Realtime API when you want the model to handle speech input and speech output directly. This is the lowest-friction path for native OpenAI voice sessions.
Full cascade
Use a full-cascade pipeline when you want to choose separate STT, LLM, and TTS providers.
This is where VAD, turn detection, and helper logic such as VoxTurnTaking matter most.
Half cascade
Use a half-cascade pipeline when OpenAI is still doing the reasoning, but another provider handles speech output. This is useful when you want a different voice, speech language coverage, or TTS pricing model.
Development notes
- Realtime API: create
OpenAI.RealtimeAPIClientwithOpenAI.createRealtimeAPIClient({ apiKey, model })and configure the session withsessionUpdate(). - Responses API client: create
OpenAI.ResponsesAPIClientwithOpenAI.createResponsesAPIClient({ apiKey, baseUrl?, storeContext })for full-cascade or text-first agent flows. - Chat Completions API client: create
OpenAI.ChatCompletionsClientwithOpenAI.createChatCompletionsClient({ apiKey, baseUrl?, storeContext })for simpler text workflows. - OpenAI-compatible APIs: Responses and Chat Completions clients can target compatible endpoints via
baseUrl, but compatibility is vendor-specific and some providers do not support the full stored-context feature set. - Barge‑in and turn control: Realtime has built-in speech events. Full-cascade flows usually combine STT with Voice Activity Detection, Turn Detection, and the Turn Taking Helper Library.
See the OpenAI API references for full details:
Examples
- Example: Answering an incoming call
- Example: Placing an outbound call
- Example: Function calling
- Example: Full-cascade incl. Groq
- Example: Half-cascade with ElevenLabs
- Example: Half-cascade with Inworld
- Example: Half-cascade with Cartesia
Links
Voximplant
- OpenAI Voice AI connector: https://voximplant.com/docs/voice-ai/openai
- OpenAI module API reference: https://voximplant.com/docs/references/voxengine/openai
- OpenAI product page: https://voximplant.com/products/openai-client
- Voice AI product overview: https://voximplant.ai/
OpenAI
- Realtime API reference: https://platform.openai.com/docs/api-reference/realtime
- Responses API reference: https://platform.openai.com/docs/api-reference/responses
- Chat Completions API reference: https://platform.openai.com/docs/api-reference/chat
- Realtime events (client/server): https://platform.openai.com/docs/api-reference/realtime-client-events
- Realtime guide: https://platform.openai.com/docs/guides/realtime