Overview

Deepgram Voice Agent in VoxEngine
View as MarkdownOpen in Claude

Benefits

The native Deepgram module connects any Voximplant call to Deepgram’s Voice Agent API for real-time, speech‑to‑speech conversations. The integration supports bi-directional audio from phone numbers, SIP trunks, WhatsApp, or WebRTC into Deepgram’s unified agent environment (STT + LLM + TTS) and play responses via Voximplant’s serverless runtime with minimal latency.

Capability and feature highlights:

  • Delivered as a Voice AI Connector inside VoxEngine: you define Deepgram STT, LLM, and TTS parameters; Voximplant handles telephony, media conversion, and streaming WebSockets.
  • Fully exposes Deepgram speech recognition, a wide variety of LLM models, and speech synthesis options from Deepgram and partners.
  • Bridge PSTN, SIP, WebRTC, or WhatsApp calls into Deepgram Voice Agent using a single VoxEngine scenario.
  • Keep conversations natural with low-latency turn‑taking and barge‑in.
  • Apply and update agent configuration mid-call.
  • Subscribe to Voice Agent events (for example AgentThinking and History).
  • Handle function calls for external integrations.

Demo video

Deepgram Voice Agent on Voximplant: Enterprise-ready Voice AI Phone Calls with Context Memory

Architecture

Deepgram Voice Agent architecture diagram

Prerequisites

A Deepgram API key is required. Visit https://console.deepgram.com/signup to create a Deepgram account if you don’t have one already. Then visit the Deepgram console to create an API key and copy its secret value. Use that secret value as your credential in VoxEngine (stored under DEEPGRAM_API_KEY). This is shown in the demo video above: Getting your Deepgram API Key.

Development notes

  • Native VoxEngine module: load with require(Modules.Deepgram) and create a Deepgram.VoiceAgentClient via Deepgram.createVoiceAgentClient(...).
  • Session setup: pass Deepgram’s Voice Agents settings object to specify agent.listen (STT), agent.think (LLM), and agent.speak (TTS). Do not include audio settings — these are hardcoded by the connector for optimum voice quality with Voximplant.
  • Events: all Deepgram Voice Agent events are supported under the Deepgram.VoiceAgentEvents enum. Subscribe to events such as ConversationText, AgentThinking, Warning, Error, and History. VoxEngine also provides WebSocketMediaStarted and WebSocketMediaEnded for debugging media flow.
  • Mid-session updates: update without reconnecting using sendUpdatePrompt and sendUpdateSpeak. Inject text with sendInjectUserMessage and sendInjectAgentMessage.
  • Function calling: define tools in Deepgram’s think.functions array; handle requests via Deepgram.VoiceAgentEvents.FunctionCallRequest and respond with FunctionCallResponse.
  • Barge‑in: listen for Deepgram.VoiceAgentEvents.UserStartedSpeaking and call voiceAgentClient.clearMediaBuffer() to cancel current TTS audio when the user interrupts.
  • Deepgram ASR is a different module: this connector is independent of VoxEngine’s existing Deepgram ASR module (VoxEngine.createASR). You can align ASR parameters with your Voice Agent listen configuration for consistent transcription before/after the agent is invoked.

See the Deepgram module API reference for full details on methods, events, and types and the examples in this section for usage.

Examples

Voximplant

Deepgram