Overview

OpenAI Realtime API in VoxEngine
View as MarkdownOpen in Claude

Benefits

The native OpenAI module connects Voximplant calls to the OpenAI Realtime API for low‑latency, speech‑to‑speech interactions. VoxEngine handles telephony, media conversion, and WebSocket streaming so you can focus on agent behavior.

Capability and feature highlights:

  • Bridge PSTN, SIP, WebRTC, or WhatsApp calls into OpenAI Realtime with a single VoxEngine scenario.
  • Real-time conversations with speech input, speech output, and partial transcript events.
  • Barge‑in with server VAD and media buffer control for natural turn‑taking.
  • Function calling for external actions (weather, transfers, CRM, etc.).
  • Flexible output modes: audio for full speech‑to‑speech, text for half‑cascade pipelines.

Demo video

OpenAI Realtime demo (general):

Architecture

Prerequisites

  • OpenAI API key stored in Voximplant ApplicationStorage under OPENAI_API_KEY.

Development notes

  • Native VoxEngine module: load with require(Modules.OpenAI) and create an OpenAI.RealtimeAPIClient via OpenAI.createRealtimeAPIClient({ apiKey, model }).
  • Session setup: configure behavior via sessionUpdate({ session: {...} }) (instructions, voice, turn detection, output modalities).
  • Barge‑in: listen for OpenAI.RealtimeAPIEvents.InputAudioBufferSpeechStarted and call client.clearMediaBuffer() to cancel current audio.
  • Function calling: define tools in the session and handle ResponseFunctionCallArgumentsDone; send results via conversationItemCreate({ item: { type: "function_call_output", ... }}).
  • Output modes: use output_modalities: ["audio"] for speech‑to‑speech, or output_modalities: ["text"] for half‑cascade pipelines.

See the OpenAI Realtime API reference for full details on events, session updates, and response creation.

Examples

Voximplant

OpenAI