Example: Half-cascade with Cartesia
Overview
This half-cascade example uses OpenAI Realtime for speech‑to‑text and reasoning, then sends OpenAI text responses to Cartesia Realtime TTS.
⬇️ Jump to the Full VoxEngine scenario.
Demo video
OpenAI + Cartesia demo:
Prerequisites
- Store your OpenAI API key in Voximplant
ApplicationStorageunderOPENAI_API_KEY. - (Optional) Update the
CARTESIA_VOICE_IDconstant in the example to your preferred voice. - (Optional) Store your Cartesia API key in Voximplant
ApplicationStorageunderCARTESIA_API_KEYif you want to use your own Cartesia account.
How it works
- OpenAI runs in text mode (
output_modalities: ["text"]). - Caller audio is sent to OpenAI:
call.sendMediaTo(voiceAIClient). - Cartesia generates speech from OpenAI text and streams it to the call.
Notes
- The example uses the
sonic-2model. Adjust the voice or output settings to match your telephony requirements. - Do not set audio format parameters in half-cascade connector requests. VoxEngine’s WebSocket gateway handles media format negotiation automatically.
- If no Cartesia API key is provided, Voximplant’s default account and billing are used.
- Custom / cloned voices are only available when using your own API key.
- Cartesia TTS requires a real text input on initialization of the player - i.e. no passing
""or" " - Subsequent turns use
generationRequest(...)with the samevoice,model_id, andlanguage.
More info
- OpenAI module API: https://voximplant.com/docs/references/voxengine/openai
- OpenAI Realtime guide: https://voximplant.com/docs/guides/ai/openai-realtime
- Cartesia module API: https://voximplant.com/docs/references/voxengine/cartesia
- Realtime TTS guide: https://voximplant.com/docs/guides/speech/realtime-tts
Full VoxEngine scenario
voxeengine-openai-half-cascade-cartesia.js