Example: Speech-to-speech translation
This example answers an inbound English call, dials a Spanish-speaking callee, and uses Gemini Live API to translate the caller’s speech into Spanish audio in real time.
⬇️ Jump to the Full VoxEngine scenario.
Prerequisites
- Set up an inbound entrypoint for the caller:
- Phone number: https://voximplant.com/docs/getting-started/basic-concepts/phone-numbers
- WhatsApp: https://voximplant.com/docs/guides/integrations/whatsapp
- SIP user / SIP registration: https://voximplant.com/docs/guides/calls/sip
- Voximplant user: https://voximplant.com/docs/getting-started/basic-concepts/users (see also https://voximplant.com/docs/guides/calls/scenarios#how-to-call-a-voximplant-user)
- Create a routing rule that points the destination (number / WhatsApp / SIP username) to this scenario: https://voximplant.com/docs/getting-started/basic-concepts/routing-rules
- Store the following values in Voximplant
ApplicationStorage:GEMINI_API_KEYCALLEE_NUMBER(Spanish-speaking callee, e.g.+34911222333)PSTN_CALLER_ID(verified caller ID / purchased Voximplant number)
Demo video
Session setup
The Gemini Live API session is configured via connectConfig, passed into Gemini.createLiveAPIClient(...).
In the full scenario, see GEMINI_CONNECT_CONFIG:
responseModalities: ["AUDIO"]asks Gemini to speak back in real time.thinkingConfig: { thinkingBudget: 0 }disables long thinking to reduce latency.realtimeInputConfig.automaticActivityDetectiontunes barge-in behavior.speechConfigselects a prebuilt voice for the translated audio.systemInstructionenforces the English → Spanish translation behavior.
To log text transcripts, uncomment inputAudioTranscription and outputAudioTranscription.
Translation pipeline (one-way)
This example uses a one-way pipeline:
The code wires the audio like this:
Barge-in
Gemini includes an interrupted flag in ServerContent when the caller speaks over TTS. The example clears the media buffer so Gemini stops speaking immediately:
Events
The scenario listens for Gemini.LiveAPIEvents.ServerContent. If transcripts are enabled, the example logs both languages:
For illustration, it also logs these events:
Gemini.LiveAPIEvents:SetupComplete,ServerContent,ConnectorInformation,UnknownGemini.Events:WebSocketMediaStarted,WebSocketMediaEnded
Notes
- This example uses the Gemini Developer API (
Gemini.Backend.GEMINI_API). - Translation is one-way (English → Spanish). For bidirectional translation, run two Gemini sessions with opposite instructions.
- The example includes short prompts (
call.say/calleeCall.say) to make recordings easier to follow. Remove them for production.
See the VoxEngine API Reference for more details.