> For a complete documentation index, fetch https://docs.voximplant.ai/llms.txt

# Example: Speech-to-speech translation

> This example answers an inbound English call, dials a Spanish-speaking callee, and uses Gemini Live API to translate the caller’s speech into Spanish audio in real time.

<blockquote>
  For the complete documentation index, see <a href="/llms.txt">llms.txt</a>.
</blockquote>

This example answers an inbound English call, dials a Spanish-speaking callee, and uses Gemini Live API to translate the caller’s speech into Spanish audio in real time.

**⬇️ Jump to the [Full VoxEngine scenario](#full-voxengine-scenario).**

<Info title="Gemini 3.1 Flash Live Preview">
  This page reflects the current `gemini-3.1-flash-live-preview` flow from Google’s Live API docs:
  [https://ai.google.dev/gemini-api/docs/models/gemini-3.1-flash-live-preview](https://ai.google.dev/gemini-api/docs/models/gemini-3.1-flash-live-preview)
</Info>

## Prerequisites

* Set up an inbound entrypoint for the caller:
  * Phone number: [https://voximplant.com/docs/getting-started/basic-concepts/phone-numbers](https://voximplant.com/docs/getting-started/basic-concepts/phone-numbers)
  * WhatsApp: [https://voximplant.com/docs/guides/integrations/whatsapp](https://voximplant.com/docs/guides/integrations/whatsapp)
  * SIP user / SIP registration: [https://voximplant.com/docs/guides/calls/sip](https://voximplant.com/docs/guides/calls/sip)
  * App user: [https://voximplant.com/docs/getting-started/basic-concepts/users](https://voximplant.com/docs/getting-started/basic-concepts/users) (see also [https://voximplant.com/docs/guides/calls/scenarios#how-to-call-a-voximplant-user](https://voximplant.com/docs/guides/calls/scenarios#how-to-call-a-voximplant-user))
* Create a routing rule that points the destination (phone number / WhatsApp / SIP username / app user alias) to this scenario: [https://voximplant.com/docs/getting-started/basic-concepts/routing-rules](https://voximplant.com/docs/getting-started/basic-concepts/routing-rules)
* Store your Gemini API key in Voximplant [Secrets](/platform/voxengine/secrets) under `GEMINI_API_KEY`.
* Store the following non-sensitive values in Voximplant `ApplicationStorage`:
  * `CALLEE_DESTINATION` (Spanish-speaking callee, e.g. `+34911222333`)
  * `PSTN_CALLER_ID` (verified caller ID / purchased Voximplant number)

## Demo video

Video link: [Gemini Live speech-to-speech translation demo](https://www.youtube.com/watch?v=B9zjcMIF7eM)

## Session setup

The Gemini Live API session is configured via `connectConfig`, passed into `Gemini.createLiveAPIClient(...)`.

In the full scenario, see `GEMINI_CONNECT_CONFIG`:

* `responseModalities: ["AUDIO"]` asks Gemini to speak back in real time.
* `thinkingConfig: { thinkingLevel: "minimal" }` reduces latency.
* `realtimeInputConfig.automaticActivityDetection` tunes barge-in behavior.
* `speechConfig` selects a prebuilt voice for the translated audio.
* `systemInstruction` enforces the English → Spanish translation behavior.
* `inputAudioTranscription` and `outputAudioTranscription` are enabled so you can log translated text during the session.

## Translation pipeline (one-way)

This example uses a one-way pipeline:

```
English caller -> Gemini Live API -> Spanish callee
```

The code wires the audio like this:

```js title="Connect audio"
call.sendMediaTo(geminiLiveAPIClient);
geminiLiveAPIClient.sendMediaTo(calleeCall);
```

## Barge-in

Gemini includes an `interrupted` flag in `ServerContent` when the caller speaks over TTS. The example clears the media buffer so Gemini stops speaking immediately:

```js title="Barge-in handling"
if (payload.interrupted !== undefined) {
  geminiLiveAPIClient.clearMediaBuffer();
}
```

## Events

The scenario listens for `Gemini.LiveAPIEvents.ServerContent`. If transcripts are enabled, the example logs both languages:

```js title="Transcripts"
if (payload.inputTranscription?.text) Logger.write(`===EN=== ${payload.inputTranscription.text}`);
if (payload.outputTranscription?.text) Logger.write(`===ES=== ${payload.outputTranscription.text}`);
```

For illustration, it also logs these events:

* `Gemini.LiveAPIEvents`: `SetupComplete`, `ServerContent`, `ConnectorInformation`, `Unknown`
* `Gemini.Events`: `WebSocketMediaStarted`, `WebSocketMediaEnded`

## Notes

* This example uses the Gemini Developer API (`Gemini.Backend.GEMINI_API`).
* The current sample uses `gemini-3.1-flash-live-preview`.
* Translation is one-way (English → Spanish). For bidirectional translation, run two Gemini sessions with opposite instructions.
* The example includes short prompts (`call.say` / `calleeCall.say`) to make recordings easier to follow. Remove them for production.

<Warning title="Gemini 2.5 compatibility">
  If you are updating an older `2.5` translation sample, replace `thinkingBudget` with `thinkingLevel`. For `3.1`, this example also sends a short `sendRealtimeInput(...)` startup instruction on `SetupComplete` so the live interpretation session begins reliably.
</Warning>

[See the VoxEngine API Reference for more details](https://voximplant.com/docs/references/voxengine/gemini).

## Full VoxEngine scenario

```javascript title={"voxeengine-gemini-s2s-translate.js"} maxLines={0}
/**
 * Voximplant + Gemini Live API connector demo
 * Scenario: real-time speech-to-speech translation (English -> Spanish).
 */

require(Modules.Gemini);
require(Modules.ApplicationStorage);

const SYSTEM_INSTRUCTIONS = `
You are a REAL-TIME INTERPRETER.

Task:
- Translate everything you hear from English to Spanish.

Rules:
- Output ONLY the Spanish translation (no English, no explanations, no extra commentary).
- Preserve meaning, tone, names, numbers, and proper nouns.
- Keep latency low: translate phrase-by-phrase as soon as you have enough context.
- Do NOT greet or introduce yourself. Speak ONLY when the caller speaks.
`;

const GEMINI_MODEL = "gemini-3.1-flash-live-preview";

VoxEngine.addEventListener(AppEvents.CallAlerting, async ({call}) => {
    let voiceAIClient;
    let calleeCall;
    let terminated = false;

    const terminate = () => {
        if (terminated) return;
        terminated = true;
        calleeCall?.hangup();
        call?.hangup();
        VoxEngine.terminate();
    };

    call.answer();
    call.record({hd_audio: true, stereo: true});
    call.addEventListener(CallEvents.Disconnected, terminate);
    call.addEventListener(CallEvents.Failed, terminate);

    const geminiApiKey = VoxEngine.getSecretValue('GEMINI_API_KEY');

    const calleeDestination = (await ApplicationStorage.get("CALLEE_DESTINATION")).value;
    calleeCall = VoxEngine.callPSTN(calleeDestination, (await ApplicationStorage.get("PSTN_CALLER_ID")).value);
    // Or call via an app user, SIP, or WhatsApp by uncommenting one of the lines below and commenting out the line above.
    // calleeCall = VoxEngine.callUser(calleeDestination);
    // calleeCall = VoxEngine.callSIP(`sip:${calleeDestination}@your-sip-domain`, (await ApplicationStorage.get("PSTN_CALLER_ID")).value);
    // calleeCall = VoxEngine.callWhatsappUser({number: calleeDestination, callerid: (await ApplicationStorage.get("PSTN_CALLER_ID")).value)});
    calleeCall.addEventListener(CallEvents.Disconnected, terminate);
    calleeCall.addEventListener(CallEvents.Failed, terminate);

    calleeCall.addEventListener(CallEvents.Connected, async () => {
        calleeCall.record({hd_audio: true, stereo: true});

        // Optional prompts to make the demo obvious on recordings.
        call.say("Connected. Speak in English. The other party will hear Spanish.");
        calleeCall.say("Connected. You will hear Spanish translation in real time.");

        const GEMINI_CONNECT_CONFIG = {
            responseModalities: ["AUDIO"],
            thinkingConfig: {thinkingLevel: "minimal"},
            realtimeInputConfig: {
                automaticActivityDetection: {
                    disabled: false,
                    prefixPaddingMs: 20,
                    silenceDurationMs: 200,
                },
            },
            speechConfig: {
                voiceConfig: {
                    prebuiltVoiceConfig: {voiceName: "Achird"},
                },
            },
            inputAudioTranscription: {},
            outputAudioTranscription: {},
            systemInstruction: {
                parts: [{text: SYSTEM_INSTRUCTIONS}],
            },
        };

        try {
            voiceAIClient = await Gemini.createLiveAPIClient({
                apiKey: geminiApiKey,
                model: GEMINI_MODEL,
                backend: Gemini.Backend.GEMINI_API,
                connectConfig: GEMINI_CONNECT_CONFIG,
                onWebSocketClose: (event) => {
                    Logger.write("===Gemini.WebSocket.Close===");
                    if (event) Logger.write(JSON.stringify(event));
                    terminate();
                },
            });

            // Caller (English) -> Gemini -> Callee (Spanish)
            call.sendMediaTo(voiceAIClient);
            voiceAIClient.sendMediaTo(calleeCall);

            voiceAIClient.addEventListener(Gemini.LiveAPIEvents.SetupComplete, (event) => {
                Logger.write("===Gemini.LiveAPIEvents.SetupComplete===");
                if (event?.data) Logger.write(JSON.stringify(event.data));
                voiceAIClient.sendRealtimeInput({
                    text: "Start real-time English to Spanish interpretation now. Do not greet. Only translate the caller's speech for the other party.",
                });
            });

            voiceAIClient.addEventListener(Gemini.LiveAPIEvents.ServerContent, (event) => {
                const payload = event?.data?.payload || {};
                if (payload.inputTranscription?.text) {
                    Logger.write(`===EN=== ${payload.inputTranscription.text}`);
                }
                if (payload.outputTranscription?.text) {
                    Logger.write(`===ES=== ${payload.outputTranscription.text}`);
                }
                if (payload.interrupted !== undefined) {
                    Logger.write("===BARGE-IN=== Gemini.LiveAPIEvents.ServerContent");
                    voiceAIClient.clearMediaBuffer();
                }
            });

            [
                Gemini.LiveAPIEvents.SetupComplete,
                Gemini.LiveAPIEvents.ServerContent,
                Gemini.LiveAPIEvents.ConnectorInformation,
                Gemini.LiveAPIEvents.Unknown,
                Gemini.Events.WebSocketMediaStarted,
                Gemini.Events.WebSocketMediaEnded,
            ].forEach((eventName) => {
                voiceAIClient.addEventListener(eventName, (event) => {
                    Logger.write(`===${event.name}===`);
                    if (event?.data) Logger.write(JSON.stringify(event.data));
                });
            });
        } catch (error) {
            Logger.write("===SOMETHING_WENT_WRONG===");
            Logger.write(error);
            terminate();
        }
    });
});

```