Example: Answering an incoming call

This example answers an inbound Voximplant call and bridges audio to Gemini Live API for real-time speech-to-speech conversations.

⬇️ Jump to the Full VoxEngine scenario.

Prerequisites

Set up an inbound entrypoint for the caller:
- Phone number: https://voximplant.com/docs/getting-started/basic-concepts/phone-numbers
- WhatsApp: https://voximplant.com/docs/guides/integrations/whatsapp
- SIP user / SIP registration: https://voximplant.com/docs/guides/calls/sip
- Voximplant user: https://voximplant.com/docs/getting-started/basic-concepts/users (see also https://voximplant.com/docs/guides/calls/scenarios#how-to-call-a-voximplant-user)
Create a routing rule that points the destination (number / WhatsApp / SIP username) to this scenario: https://voximplant.com/docs/getting-started/basic-concepts/routing-rules
Store your Gemini API key in Voximplant ApplicationStorage under GEMINI_API_KEY.

Session setup

The Gemini Live API session is configured via connectConfig, passed into Gemini.createLiveAPIClient(...).

In the full scenario, see GEMINI_CONNECT_CONFIG:

systemInstruction maps directly to SYSTEM_PROMPT, defining the agent’s behavior.
responseModalities: ["AUDIO"] asks Gemini to speak back over the call.
inputAudioTranscription and outputAudioTranscription are enabled so ServerContent includes user + agent text.

Transcription logging

If you don’t need transcript logs, you can remove inputAudioTranscription and outputAudioTranscription.

Connect call audio

Once the Gemini Live API session is ready, bridge audio between the call and Gemini:

Connect call audio

1 VoxEngine.sendMediaBetween(call, geminiLiveAPIClient);

In the example, this happens in the Gemini.LiveAPIEvents.SetupComplete handler, after the Gemini session is ready. The same handler also sends a starter message to trigger the greeting:

Trigger the greeting

1 geminiLiveAPIClient.sendClientContent({
2   turns: [{ role: "user", parts: [{ text: GREETING_TRIGGER }] }],
3   turnComplete: true,
4 });

Barge-in

Gemini includes an interrupted flag in ServerContent when the caller starts speaking during TTS. The example clears the media buffer so the agent stops speaking immediately:

Barge-in handling

1 if (payload.interrupted) {
2   geminiLiveAPIClient.clearMediaBuffer();
3 }

Events

The scenario listens for Gemini.LiveAPIEvents.ServerContent to capture transcript text:

Transcripts

1 geminiLiveAPIClient.addEventListener(Gemini.LiveAPIEvents.ServerContent, (event) => {
2   const payload = event?.data?.payload || {};
3   if (payload.inputTranscription?.text) Logger.write(payload.inputTranscription.text);
4   if (payload.outputTranscription?.text) Logger.write(payload.outputTranscription.text);
5 });

For illustration, the example also logs all Gemini events:

Gemini.LiveAPIEvents: SetupComplete, ServerContent, ToolCall, ToolCallCancellation, ConnectorInformation, Unknown
Gemini.Events: WebSocketMediaStarted, WebSocketMediaEnded

Notes

The example uses the Gemini Developer API (Gemini.Backend.GEMINI_API), not Vertex AI.
inputAudioTranscription and outputAudioTranscription are enabled so you can log user and agent text in ServerContent events.

See the VoxEngine API Reference for more details.

Full VoxEngine scenario

voxeengine-gemini-answer-incoming-call.js

1 /**
2  * Voximplant + Gemini Live API connector demo
3  * Scenario: answer an incoming call and bridge it to Gemini Live API.
4  */
5 
6 require(Modules.Gemini);
7 require(Modules.ApplicationStorage);
8 
9 const SYSTEM_PROMPT = `You are Voxi, a helpful voice assistant for phone callers. 
10 Keep responses short and telephony-friendly (usually 1-2 sentences).`;
11 
12 // -------------------- Gemini Live API settings --------------------
13 const CONNECT_CONFIG = {
14     responseModalities: ["AUDIO"],
15     speechConfig: {
16         voiceConfig: {
17             prebuiltVoiceConfig: {voiceName: "Aoede"},
18         },
19     },
20     systemInstruction: {
21         parts: [{text: SYSTEM_PROMPT}],
22     },
23     inputAudioTranscription: {},
24     outputAudioTranscription: {},
25 };
26 
27 VoxEngine.addEventListener(AppEvents.CallAlerting, async ({call}) => {
28     let voiceAIClient;
29 
30     // Termination functions - add cleanup and logging as needed
31     call.addEventListener(CallEvents.Disconnected, ()=>VoxEngine.terminate());
32     call.addEventListener(CallEvents.Failed, ()=>VoxEngine.terminate());
33 
34     try {
35         call.answer();
36         // call.record({ hd_audio: true, stereo: true });   // Optional: record the call
37 
38         // Create client and connect to Gemini Live API
39         voiceAIClient = await Gemini.createLiveAPIClient({
40             apiKey: (await ApplicationStorage.get("GEMINI_API_KEY")).value,
41             model: "gemini-2.5-flash-native-audio-preview-12-2025",
42             backend: Gemini.Backend.GEMINI_API,
43             connectConfig: CONNECT_CONFIG,
44             onWebSocketClose: (event) => {
45                 Logger.write("===Gemini.WebSocket.Close===");
46                 if (event) Logger.write(JSON.stringify(event));
47                 VoxEngine.terminate();
48             },
49         });
50 
51         // ---------------------- Event handlers -----------------------
52         // Wait for Gemini setup, then bridge audio and trigger the greeting
53         voiceAIClient.addEventListener(Gemini.LiveAPIEvents.SetupComplete, () => {
54             VoxEngine.sendMediaBetween(call, voiceAIClient);
55             voiceAIClient.sendClientContent({
56                 turns: [{role: "user", parts: [{text: "Say hello and ask how you can help."}]}],
57                 turnComplete: true,
58             });
59         });
60 
61         // Capture transcripts + handle barge-in
62         voiceAIClient.addEventListener(Gemini.LiveAPIEvents.ServerContent, (event) => {
63             const payload = event?.data?.payload || {};
64             if (payload.inputTranscription?.text) {
65                 Logger.write(`===USER=== ${payload.inputTranscription.text}`);
66             }
67             if (payload.outputTranscription?.text) {
68                 Logger.write(`===AGENT=== ${payload.outputTranscription.text}`);
69             }
70             if (payload.interrupted) {
71                 Logger.write("===BARGE-IN=== Gemini.LiveAPIEvents.ServerContent");
72                 voiceAIClient.clearMediaBuffer();
73             }
74         });
75 
76         // Log all Gemini events for illustration/debugging
77         [
78             Gemini.LiveAPIEvents.SetupComplete,
79             Gemini.LiveAPIEvents.ServerContent,
80             Gemini.LiveAPIEvents.ToolCall,
81             Gemini.LiveAPIEvents.ToolCallCancellation,
82             Gemini.LiveAPIEvents.ConnectorInformation,
83             Gemini.LiveAPIEvents.Unknown,
84             Gemini.Events.WebSocketMediaStarted,
85             Gemini.Events.WebSocketMediaEnded,
86         ].forEach((eventName) => {
87             voiceAIClient.addEventListener(eventName, (event) => {
88                 Logger.write(`===${event.name}===`);
89                 if (event?.data) Logger.write(JSON.stringify(event.data));
90             });
91         });
92     } catch (error) {
93         Logger.write("===SOMETHING_WENT_WRONG===");
94         Logger.write(error);
95         voiceAIClient?.close();
96         VoxEngine.terminate();    }
97 });