Overview

For the complete documentation index, see llms.txt.

Benefits

The native OpenAI module gives VoxEngine direct access to OpenAI’s Realtime, Responses, and Chat Completions APIs. It also works with OpenAI-compatible APIs for text-based pipelines, so you can keep the same VoxEngine integration pattern while swapping the LLM backend.

Capability and feature highlights:

Bridge PSTN, SIP, WebRTC, or WhatsApp calls into OpenAI with a single VoxEngine scenario.
Use the API surface that fits your architecture: Realtime, Responses API, or Chat Completions.
Run direct realtime speech-to-speech or build half-cascade and full-cascade pipelines.
Use OpenAI-compatible APIs for text-first pipelines through the same OpenAI module surface.
Add barge‑in, VAD, and turn detection for natural turn-taking in cascade pipelines.

Demo video

OpenAI Realtime demo (general):

Video link: OpenAI Realtime API demo

Architecture

Prerequisites

OpenAI account with API access at platform.openai.com.
OpenAI API key from OpenAI API keys.
Access to the model family you want to use in these guides (for example gpt-realtime-1.5, gpt-4o-mini, or another compatible text model).

Supported API surfaces

The VoxEngine OpenAI module currently supports three API shapes:

Realtime API for direct speech-to-speech sessions with native input audio, output audio, server-side VAD, and low-latency streaming.
Responses API client for text-first and cascade pipelines, including OpenAI-compatible backends exposed through a custom baseUrl.
Chat Completions API client for simpler request/response or streaming text workflows when you do not need the newer Responses API surface.

Pipeline options

Direct realtime

Use the Realtime API when you want the model to handle speech input and speech output directly. This is the lowest-friction path for native OpenAI voice sessions.

Full cascade

Use a full-cascade pipeline when you want to choose separate STT, LLM, and TTS providers. This is where VAD, turn detection, and helper logic such as VoxTurnTaking matter most.

Half cascade

Use a half-cascade pipeline when OpenAI is still doing the reasoning, but another provider handles speech output. This is useful when you want a different voice, speech language coverage, or TTS pricing model.

Development notes

Realtime API: create OpenAI.RealtimeAPIClient with OpenAI.createRealtimeAPIClient({ apiKey, model }) and configure the session with sessionUpdate().
Responses API client: create OpenAI.ResponsesAPIClient with OpenAI.createResponsesAPIClient({ apiKey, baseUrl?, storeContext }) for full-cascade or text-first agent flows.
Chat Completions API client: create OpenAI.ChatCompletionsClient with OpenAI.createChatCompletionsClient({ apiKey, baseUrl?, storeContext }) for simpler text workflows.
OpenAI-compatible APIs: Responses and Chat Completions clients can target compatible endpoints via baseUrl, but compatibility is vendor-specific and some providers do not support the full stored-context feature set.
Barge‑in and turn control: Realtime has built-in speech events. Full-cascade flows usually combine STT with Voice Activity Detection, Turn Detection, and the Turn Taking Helper Library.

See the OpenAI API references for full details: