*** title: Overview subtitle: Gemini Live API in VoxEngine -------------------------------------- ## Benefits The native Gemini module connects Voximplant calls to Google’s Gemini Live API for real‑time, speech‑to‑speech conversations. This integration supports inbound and outbound calls and is designed to bridge telephony to Gemini with low latency while keeping call control inside VoxEngine. Capability and feature highlights: * **Connect inbound and outbound calls to a Gemini‑powered agent** with a real‑time, speech‑to‑speech interface. * **Minimal audio latency** by sending media directly from Voximplant media servers to Gemini in the required audio format. * **Endpoint flexibility** across phone calls, Web SDK, SIP, and WhatsApp Business Calling. * **Barge‑in and playback interruption** to keep conversations natural. * **Real‑time event streaming** for Gemini session events. ## Architecture Gemini Live API is a stateful WebSocket API: VoxEngine opens a session and streams audio to Gemini while receiving audio, text, and tool call requests back over the same connection. Voximplant's Grok Voice Agent API integration uses a WebSocket connection to stream audio between VoxEngine and Grok. The Voice AI connector handles connection establishment, media conversion, playback, and audio capture. ```mermaid graph LR Caller[PSTN/SIP/WhatsApp/WebRTC] <-->|Media & call control| VoxEngine[VoxEngine Scenario] VoxEngine <-->|WebSocket: Config, Audio & Events| Gemini[Gemini Live API] ``` Tool calls detected by Gemini are signaling in events where you can implement custom logic and provide a response in your VoxEngine scenario. ## Prerequisites * **Gemini API key** (for the Gemini API backend). The Live API uses an API key to establish a session. * **Vertex AI credentials** (for the Vertex backend). Gemini Live API is also available through Vertex AI. ## Development notes ### Gemini Developer API and Vertex AI support Voximplant supports both the Gemini Developer API and Vertex AI backends for the Gemini Live API. The Gemini Developer API backend uses an API key for authentication, while the Vertex AI backend requires Google Cloud credentials. See the examples for details on each approach. * **WebSocket session config**: Live API session configuration includes response modalities, system instructions, and tools in the initial setup message. * **Audio‑to‑audio responses**: Use `responseModalities: ["AUDIO"]` to receive audio responses from the model. * **Input/output transcriptions**: The Live API can return input and output audio transcriptions when enabled in the session config. * **Turn detection and barge‑in**: Automatic activity detection can be configured (prefix padding and silence duration) and is used to detect speech activity. * **Function calling**: Live API sessions can receive function call requests from the model. ## Examples * [Example: Answering an incoming call](gemini-live-voice-ai-example-answering-incoming-call) * [Example: Using Vertex AI](gemini-live-voice-ai-example-using-vertex-ai) * [Example: Placing an outbound call](gemini-live-voice-ai-example-placing-outbound-call) * [Example: Function calling](gemini-live-voice-ai-example-function-calling) * [Example: Speech-to-speech translation](gemini-live-voice-ai-example-speech-to-speech-translation) ## Links ### Voximplant * Google Gemini Live API Client overview: [https://voximplant.com/products/gemini-client](https://voximplant.com/products/gemini-client) * Voice AI product overview: [https://voximplant.ai/](https://voximplant.ai/) ### Google * Gemini Live API (Get started): [https://ai.google.dev/gemini-api/docs/live](https://ai.google.dev/gemini-api/docs/live) * Gemini Live API capabilities: [https://ai.google.dev/gemini-api/docs/live-guide](https://ai.google.dev/gemini-api/docs/live-guide) * Gemini Live API WebSocket reference: [https://ai.google.dev/api/live](https://ai.google.dev/api/live) * Vertex AI Live API overview: [https://cloud.google.com/vertex-ai/generative-ai/docs/live-api](https://cloud.google.com/vertex-ai/generative-ai/docs/live-api)