Example: Speech-to-speech translation
Example: Speech-to-speech translation
For the complete documentation index, see llms.txt.
This example answers an inbound English call, dials a Spanish-speaking callee, and uses Gemini Live API to translate the caller’s speech into Spanish audio in real time.
⬇️ Jump to the Full VoxEngine scenario.
Gemini 3.1 Flash Live Preview
This page reflects the current gemini-3.1-flash-live-preview flow from Google’s Live API docs:
https://ai.google.dev/gemini-api/docs/models/gemini-3.1-flash-live-preview
Prerequisites
- Set up an inbound entrypoint for the caller:
- Phone number: https://voximplant.com/docs/getting-started/basic-concepts/phone-numbers
- WhatsApp: https://voximplant.com/docs/guides/integrations/whatsapp
- SIP user / SIP registration: https://voximplant.com/docs/guides/calls/sip
- App user: https://voximplant.com/docs/getting-started/basic-concepts/users (see also https://voximplant.com/docs/guides/calls/scenarios#how-to-call-a-voximplant-user)
- Create a routing rule that points the destination (phone number / WhatsApp / SIP username / app user alias) to this scenario: https://voximplant.com/docs/getting-started/basic-concepts/routing-rules
- Store the following values in Voximplant
ApplicationStorage:GEMINI_API_KEYCALLEE_DESTINATION(Spanish-speaking callee, e.g.+34911222333)PSTN_CALLER_ID(verified caller ID / purchased Voximplant number)
Demo video
Video link: Gemini Live speech-to-speech translation demo
Session setup
The Gemini Live API session is configured via connectConfig, passed into Gemini.createLiveAPIClient(...).
In the full scenario, see GEMINI_CONNECT_CONFIG:
responseModalities: ["AUDIO"]asks Gemini to speak back in real time.thinkingConfig: { thinkingLevel: "minimal" }reduces latency.realtimeInputConfig.automaticActivityDetectiontunes barge-in behavior.speechConfigselects a prebuilt voice for the translated audio.systemInstructionenforces the English → Spanish translation behavior.inputAudioTranscriptionandoutputAudioTranscriptionare enabled so you can log translated text during the session.
Translation pipeline (one-way)
This example uses a one-way pipeline:
The code wires the audio like this:
Barge-in
Gemini includes an interrupted flag in ServerContent when the caller speaks over TTS. The example clears the media buffer so Gemini stops speaking immediately:
Events
The scenario listens for Gemini.LiveAPIEvents.ServerContent. If transcripts are enabled, the example logs both languages:
For illustration, it also logs these events:
Gemini.LiveAPIEvents:SetupComplete,ServerContent,ConnectorInformation,UnknownGemini.Events:WebSocketMediaStarted,WebSocketMediaEnded
Notes
- This example uses the Gemini Developer API (
Gemini.Backend.GEMINI_API). - The current sample uses
gemini-3.1-flash-live-preview. - Translation is one-way (English → Spanish). For bidirectional translation, run two Gemini sessions with opposite instructions.
- The example includes short prompts (
call.say/calleeCall.say) to make recordings easier to follow. Remove them for production.
Gemini 2.5 compatibility
If you are updating an older 2.5 translation sample, replace thinkingBudget with thinkingLevel. For 3.1, this example also sends a short sendRealtimeInput(...) startup instruction on SetupComplete so the live interpretation session begins reliably.
See the VoxEngine API Reference for more details.