> For a complete documentation index, fetch https://docs.voximplant.ai/llms.txt

# Example: Half-cascade with Cartesia

> This half-cascade example uses OpenAI Realtime for speech‑to‑text and reasoning, then sends OpenAI text responses to Cartesia Realtime TTS.

<blockquote>
  For the complete documentation index, see <a href="/llms.txt">llms.txt</a>.
</blockquote>

## Overview

This half-cascade example uses OpenAI Realtime for speech‑to‑text and reasoning, then sends OpenAI text responses to Cartesia Realtime TTS.

**⬇️ Jump to the [Full VoxEngine scenario](#full-voxengine-scenario).**

## Demo video

OpenAI + Cartesia demo:

Video link: [OpenAI + Cartesia demo](https://www.youtube.com/watch?v=Rg97CLWatqo)

## Prerequisites

* Store your OpenAI API key in Voximplant [Secrets](/platform/voxengine/secrets) under `OPENAI_API_KEY`.
* (Optional) Update the `CARTESIA_VOICE_ID` constant in the example to your preferred voice.
* (Optional) Store your Cartesia API key in Voximplant [Secrets](/platform/voxengine/secrets) under `CARTESIA_API_KEY` if you want to use your own Cartesia account.

## How it works

* OpenAI runs in text mode (`output_modalities: ["text"]`).
* Caller audio is sent to OpenAI: `call.sendMediaTo(voiceAIClient)`.
* Cartesia generates speech from OpenAI text and streams it to the call.

## Notes

* The example uses the `sonic-2` model. Adjust the voice or output settings to match your telephony requirements.
* Do not set audio format parameters in half-cascade connector requests. VoxEngine's WebSocket gateway handles media format negotiation automatically.
* If no Cartesia API key is provided, Voximplant's default account and billing are used.
* Custom / cloned voices are only available when using your own API key.
* Cartesia TTS requires a real text input on initialization of the player - i.e. no passing `""` or `" "`
* Subsequent turns use `generationRequest(...)` with the same `voice`, `model_id`, and `language`.

## More info

* OpenAI module API: [https://voximplant.com/docs/references/voxengine/openai](https://voximplant.com/docs/references/voxengine/openai)
* OpenAI Realtime guide: [https://voximplant.com/docs/guides/ai/openai-realtime](https://voximplant.com/docs/guides/ai/openai-realtime)
* Cartesia module API: [https://voximplant.com/docs/references/voxengine/cartesia](https://voximplant.com/docs/references/voxengine/cartesia)
* Realtime TTS guide: [https://voximplant.com/docs/guides/speech/realtime-tts](https://voximplant.com/docs/guides/speech/realtime-tts)

## Full VoxEngine scenario

```javascript title={"voxeengine-openai-half-cascade-cartesia.js"} maxLines={0}
/**
 * Voximplant + OpenAI Realtime API + Cartesia TTS demo
 * Scenario: OpenAI handles STT/LLM, Cartesia handles TTS (half-cascade).
 */

require(Modules.OpenAI);
require(Modules.Cartesia);
const SYSTEM_PROMPT = `
You are Voxi, a helpful phone assistant.
Keep responses short and telephony-friendly.
Reply in English.
`;

const CARTESIA_MODEL_ID = "sonic-2";
const CARTESIA_VOICE_ID = "a0e99841-438c-4a64-b679-ae501e7d6091";

const SESSION_CONFIG = {
    session: {
        type: "realtime",
        instructions: SYSTEM_PROMPT,
        output_modalities: ["text"],
        turn_detection: {type: "server_vad", interrupt_response: true},
    },
};

VoxEngine.addEventListener(AppEvents.CallAlerting, async ({call}) => {
    let voiceAIClient;
    let ttsPlayer;

    call.addEventListener(CallEvents.Disconnected, () => VoxEngine.terminate());
    call.addEventListener(CallEvents.Failed, () => VoxEngine.terminate());

    try {
        call.answer();
        // call.record({hd_audio: true, stereo: true}); // Optional: record the call

        const openAiKey = VoxEngine.getSecretValue('OPENAI_API_KEY');

        voiceAIClient = await OpenAI.createRealtimeAPIClient({
            apiKey: openAiKey,
            model: "gpt-realtime-1.5",
            onWebSocketClose: (event) => {
                Logger.write("===OpenAI.WebSocket.Close===");
                if (event) Logger.write(JSON.stringify(event));
                VoxEngine.terminate();
            },
        });

        voiceAIClient.addEventListener(OpenAI.RealtimeAPIEvents.SessionCreated, () => {
            voiceAIClient.sessionUpdate(SESSION_CONFIG);
        });

        voiceAIClient.addEventListener(OpenAI.RealtimeAPIEvents.SessionUpdated, () => {
            call.sendMediaTo(voiceAIClient);
            voiceAIClient.responseCreate({instructions: "Hello! How can I help today?"});
        });

        voiceAIClient.addEventListener(OpenAI.RealtimeAPIEvents.ResponseOutputTextDone, (event) => {
            const payload = event?.data?.payload || event?.data || {};
            const text = payload.text || payload.delta;
            if (!text) return;
            Logger.write(`===AGENT_TEXT=== ${text}`);

            // Cartesia TTS requires input on initialization, so we lazily create the player here as needed
            if (!ttsPlayer) {
                const contextId = `openai-cartesia-${Date.now()}`;
                const cartesiaOptions = {
                    // apikey: VoxEngine.getSecretValue('CARTESIA_API_KEY'), // optional
                    generationRequestParameters: {
                        model_id: CARTESIA_MODEL_ID,
                        transcript: text,
                        language: "en",
                        voice: {mode: "id", id: CARTESIA_VOICE_ID},
                        context_id: contextId,
                        continue: false,
                    },
                };

                ttsPlayer = Cartesia.createRealtimeTTSPlayer(text, cartesiaOptions);
                ttsPlayer.sendMediaTo(call);
                return;
            }

            const contextId = `openai-cartesia-${Date.now()}`;
            ttsPlayer.generationRequest({
                model_id: CARTESIA_MODEL_ID,
                transcript: text,
                language: "en",
                voice: {mode: "id", id: CARTESIA_VOICE_ID},
                context_id: contextId,
                continue: false,
            });
        });

        // Barge-in: clear both OpenAI and Cartesia buffers
        voiceAIClient.addEventListener(OpenAI.RealtimeAPIEvents.InputAudioBufferSpeechStarted, () => {
            Logger.write("===BARGE-IN: OpenAI.InputAudioBufferSpeechStarted===");
            voiceAIClient.clearMediaBuffer();
            ttsPlayer?.clearBuffer();
        });

        // ---------------------- Log all other events for debugging -----------------------
        [
            OpenAI.RealtimeAPIEvents.ResponseCreated,
            OpenAI.RealtimeAPIEvents.ResponseDone,
            OpenAI.RealtimeAPIEvents.ResponseOutputTextDelta,
            OpenAI.RealtimeAPIEvents.ConnectorInformation,
            OpenAI.RealtimeAPIEvents.HTTPResponse,
            OpenAI.RealtimeAPIEvents.WebSocketError,
            OpenAI.RealtimeAPIEvents.Unknown,
            OpenAI.Events.WebSocketMediaStarted,
            OpenAI.Events.WebSocketMediaEnded,
        ].forEach((eventName) => {
            voiceAIClient.addEventListener(eventName, (event) => {
                Logger.write(`===${event.name}===`);
                if (event?.data) Logger.write(JSON.stringify(event.data));
            });
        });
    } catch (error) {
        Logger.write("===UNHANDLED_ERROR===");
        Logger.write(error);
        voiceAIClient?.close();
        VoxEngine.terminate();
    }
});

```