Example: Half-cascade with ElevenLabs
Overview
This half-cascade example uses OpenAI Realtime for speech‑to‑text and reasoning, then sends OpenAI text responses to ElevenLabs Realtime TTS.
⬇️ Jump to the Full VoxEngine scenario.
Prerequisites
- Store your OpenAI API key in Voximplant
ApplicationStorageunderOPENAI_API_KEY. - (Optional) Update the
ELEVENLABS_VOICE_IDconstant in the example to your preferred voice. - (Optional) Store your ElevenLabs API key in Voximplant
ApplicationStorageunderELEVENLABS_API_KEYif you want to use your own ElevenLabs account.
How it works
- OpenAI runs in text mode (
output_modalities: ["text"]). - Caller audio is sent to OpenAI:
call.sendMediaTo(voiceAIClient). - ElevenLabs generates speech from OpenAI text and streams it to the call.
Notes
- The example uses
eleven_turbo_v2_5. - The example instructs the agent to reply in English.
- Do not set audio format parameters (for example
ulaw_8000) in half-cascade connector requests. VoxEngine’s WebSocket gateway handles media format negotiation automatically. - If no ElevenLabs API key is provided, Voximplant’s default account and billing are used.
- Custom / cloned voices are only available when using your own API key.
- Use
append(text, true)for each response chunk so playback stays responsive. - No audio format/codec params should be passed for this half-cascade flow.
More info
- OpenAI module API: https://voximplant.com/docs/references/voxengine/openai
- OpenAI Realtime guide: https://voximplant.com/docs/guides/ai/openai-realtime
- ElevenLabs module API: https://voximplant.com/docs/references/voxengine/elevenlabs
- Realtime TTS guide: https://voximplant.com/docs/guides/speech/realtime-tts
Full VoxEngine scenario
voxeengine-openai-half-cascade-elevenlabs.js