Example: Answering an incoming call
This example answers an inbound Voximplant call and bridges audio to Gemini Live API for real-time speech-to-speech conversations.
⬇️ Jump to the Full VoxEngine scenario.
Prerequisites
- Set up an inbound entrypoint for the caller:
- Phone number: https://voximplant.com/docs/getting-started/basic-concepts/phone-numbers
- WhatsApp: https://voximplant.com/docs/guides/integrations/whatsapp
- SIP user / SIP registration: https://voximplant.com/docs/guides/calls/sip
- Voximplant user: https://voximplant.com/docs/getting-started/basic-concepts/users (see also https://voximplant.com/docs/guides/calls/scenarios#how-to-call-a-voximplant-user)
- Create a routing rule that points the destination (number / WhatsApp / SIP username) to this scenario: https://voximplant.com/docs/getting-started/basic-concepts/routing-rules
- Store your Gemini API key in Voximplant
ApplicationStorageunderGEMINI_API_KEY.
Session setup
The Gemini Live API session is configured via connectConfig, passed into Gemini.createLiveAPIClient(...).
In the full scenario, see GEMINI_CONNECT_CONFIG:
systemInstructionmaps directly toSYSTEM_PROMPT, defining the agent’s behavior.responseModalities: ["AUDIO"]asks Gemini to speak back over the call.inputAudioTranscriptionandoutputAudioTranscriptionare enabled soServerContentincludes user + agent text.
Transcription logging
If you don’t need transcript logs, you can remove inputAudioTranscription and outputAudioTranscription.
Connect call audio
Once the Gemini Live API session is ready, bridge audio between the call and Gemini:
In the example, this happens in the Gemini.LiveAPIEvents.SetupComplete handler, after the Gemini session is ready. The same handler also sends a starter message to trigger the greeting:
Barge-in
Gemini includes an interrupted flag in ServerContent when the caller starts speaking during TTS. The example clears the media buffer so the agent stops speaking immediately:
Events
The scenario listens for Gemini.LiveAPIEvents.ServerContent to capture transcript text:
For illustration, the example also logs all Gemini events:
Gemini.LiveAPIEvents:SetupComplete,ServerContent,ToolCall,ToolCallCancellation,ConnectorInformation,UnknownGemini.Events:WebSocketMediaStarted,WebSocketMediaEnded
Notes
- The example uses the Gemini Developer API (
Gemini.Backend.GEMINI_API), not Vertex AI. inputAudioTranscriptionandoutputAudioTranscriptionare enabled so you can log user and agent text inServerContentevents.
See the VoxEngine API Reference for more details.