Example: Full-cascade incl. Groq

View as Markdown

Overview

This full-cascade example demonstrates:

  1. A full cascade Voice AI pipeline with independent Speech-to-Text (STT), Large Language Model (LLM), and Text-to-Speech (TTS) components
  2. 3rd Party LLM use using OpenAI Compatibility mode in the VoxEngine OpenAI module
  3. Turn taking with barge-in and end-of-turn detection to keep interactions natural and responsive

This specific example uses Deepgram for STT with custom vocabulary, Groq’s OpenAI-compatible Responses API using llama-3.3-70b-versatile for the LLM, and Inworld for low-latency, streaming TTS. These can be changed out for any supported VoxEngine modules or external APIs as needed.

⬇️ Jump to the Full VoxEngine scenario.

Prerequisites

  • Store your Groq API key in Voximplant ApplicationStorage under GROQ_API_KEY.
  • Include vox-turn-taking before this scenario in the same routing rule sequence. Code for the turn-taking helper is available at Turn Taking Helper Code.

How it works

  • Deepgram transcribes caller audio with interim and final transcripts.
  • VoxTurnTaking runs Silero VAD and Pipecat Smart Turn-style detection to decide when a user turn is ready.
  • The scenario sends completed user turns to Groq through OpenAI.createResponsesAPIClient({ baseUrl: "https://api.groq.com/openai/v1" }).
  • Response text deltas are streamed into Inworld TTS and played back into the call.

Full-cascade flow illustration

Notes

  • This example uses an OpenAI-compatible API, not OpenAI’s own hosted Responses API. The VoxEngine OpenAI module still works because the Groq endpoint follows the same request and event model closely enough for this flow.
  • Groq’s current Responses API support is still limited relative to OpenAI’s full stored-context flow. In practice, you should not assume support for features such as previous_response_id or storeContext. This example keeps each turn independent to stay simple and predictable.
  • If you need multi-turn memory with Groq, manage conversation history locally and resend the full structured input on each request.
  • The included prompt is intentionally short for example readability. Text-expecting models such as Llama usually behave better with a more explicit system prompt that tightly defines tone, grounding, brevity, ambiguity handling, and how to respond to partial caller fragments.
  • The turn-taking behavior in this example depends on the Turn Taking Helper Library. For details on turn taking parameters, see Turn Taking Helper Library Guide.

More info

Full VoxEngine scenario

voxeengine-full-cascade-dg-groq-iw.js
1/**
2 * Full-cascade Voice AI demo: Deepgram STT + Groq Llama Responses API + Inworld TTS
3 * Scenario: answer an incoming call using VoxTurnTaking for turn management.
4 *
5 * Include `vox-turn-taking` BEFORE this scenario in the routing rule sequence.
6 *
7 * Groq's Responses API is OpenAI-compatible, but it does not currently support
8 * `previous_response_id`. To keep this example simple, each turn is submitted
9 * independently instead of rebuilding prior conversation history locally.
10 */
11
12require(Modules.ASR);
13require(Modules.OpenAI);
14require(Modules.Inworld);
15require(Modules.ApplicationStorage);
16
17const SYSTEM_PROMPT = `
18You are Voxi, a helpful phone assistant for Voximplant. Keep responses short, polite, and telephony-friendly (usually 1-2 sentences).
19Reply in English.
20`;
21
22VoxEngine.addEventListener(AppEvents.CallAlerting, async ({call}) => {
23 let stt;
24 let responsesClient;
25 let ttsPlayer;
26 let turnTaking;
27 const terminate = () => {
28 stt?.stop();
29 responsesClient?.close();
30 turnTaking?.close();
31 VoxEngine.terminate();
32 };
33
34 call.addEventListener(CallEvents.Disconnected, terminate);
35 call.addEventListener(CallEvents.Failed, terminate);
36
37 try {
38 call.answer();
39 call.record({hd_audio: true, stereo: true}); // optional recording
40
41 stt = VoxEngine.createASR({
42 profile: ASRProfileList.Deepgram.en_US,
43 interimResults: true,
44 request: {
45 language: "en-US",
46 model: "nova-2-phonecall",
47 keywords: ["Voximplant:4", "OpenAI:2"],
48 },
49 });
50
51 responsesClient = await OpenAI.createResponsesAPIClient({
52 apiKey: (await ApplicationStorage.get("GROQ_API_KEY")).value,
53 baseUrl: "https://api.groq.com/openai/v1",
54 storeContext: false,
55 onWebSocketClose: (event) => {
56 Logger.write("===Groq.WebSocket.Close===");
57 if (event) Logger.write(JSON.stringify(event));
58 terminate();
59 },
60 });
61
62 ttsPlayer = Inworld.createRealtimeTTSPlayer({
63 createContextParameters: {
64 create: {
65 voiceId: "Ashley",
66 modelId: "inworld-tts-1.5-mini",
67 speakingRate: 1.1,
68 temperature: 1.3,
69 }
70 }
71 });
72
73 // Load the VoxTurnTaking module as part of the routing rule
74 turnTaking = await VoxTurnTaking.create({
75 call,
76 stt,
77 vadOptions: {
78 threshold: 0.5, // sensitivity for detecting speech vs silence
79 minSilenceDurationMs: 350, // silence required before VAD marks speech end
80 speechPadMs: 10, // small padding around detected speech
81 },
82 turnDetectorOptions: {
83 threshold: 0.5, // end-of-turn probability needed from Pipecat
84 },
85 policy: {
86 transcriptSettleMs: 500, // grace period for a final STT chunk after end-of-turn
87 userSpeechTimeoutMs: 1000, // default fallback submit timeout after speech ends
88 shortUtteranceExtensionMs: 1800, // longer hold for fragments that may continue
89 fastShortUtteranceTimeoutMs: 700, // faster submit for short complete utterances like "hey"
90 shortUtteranceMaxChars: 12, // max chars still treated as a short fragment
91 shortUtteranceMaxWords: 2, // max words still treated as a short fragment
92 lowConfidenceShortUtteranceThreshold: 0.75, // keep short low-confidence finals replaceable
93 },
94 enableLogging: true,
95 onUserTurn: (input) => { // send the transcript text on end-of-turn
96 responsesClient.createResponses({
97 model: "llama-3.3-70b-versatile",
98 instructions: SYSTEM_PROMPT,
99 input,
100 });
101 },
102 onInterrupt: () => {
103 ttsPlayer?.clearBuffer(); // stop any in-progress TTS audio
104 },
105 });
106
107 responsesClient.addEventListener(OpenAI.ResponsesAPIEvents.ResponseTextDelta, (event) => {
108 const text = event?.data?.payload?.delta;
109 if (!text || !turnTaking.canPlayAgentAudio()) return;
110 ttsPlayer.send({send_text: {text}});
111 });
112
113 responsesClient.addEventListener(OpenAI.ResponsesAPIEvents.ResponseTextDone, (event) => {
114 const text = event?.data?.payload?.text;
115 Logger.write(`===AGENT=== ${text}`);
116 ttsPlayer.send({flush_context: {}}); // Tell TTS to process all buffered text immediately
117 });
118
119 // Event logging to illustrate available OpenAI Responses API client events
120 [
121 OpenAI.ResponsesAPIEvents.ResponseCreated,
122 OpenAI.ResponsesAPIEvents.ResponseFailed,
123 OpenAI.ResponsesAPIEvents.ResponsesAPIError,
124 OpenAI.ResponsesAPIEvents.ResponseInProgress,
125 OpenAI.ResponsesAPIEvents.ResponseCompleted,
126 OpenAI.ResponsesAPIEvents.ResponseOutputItemAdded,
127 OpenAI.ResponsesAPIEvents.ResponseContentPartAdded,
128 OpenAI.ResponsesAPIEvents.ConnectorInformation,
129 OpenAI.ResponsesAPIEvents.Unknown,
130 OpenAI.Events.WebSocketMediaStarted,
131 OpenAI.Events.WebSocketMediaEnded,
132 ].forEach((eventName) => {
133 responsesClient.addEventListener(eventName, (event) => {
134 Logger.write(`===${event?.name || eventName}===`);
135 if (event?.data) Logger.write(JSON.stringify(event.data));
136 });
137 });
138
139 // Attach the caller media
140 call.sendMediaTo(stt);
141 ttsPlayer.sendMediaTo(call);
142
143 // Tell the LLM to talk first and greet the user
144 responsesClient.createResponses({
145 model: "llama-3.3-70b-versatile",
146 instructions: SYSTEM_PROMPT,
147 input: "Greet the caller briefly.",
148 });
149
150
151 } catch (error) {
152 Logger.write("===UNHANDLED_ERROR===");
153 Logger.write(error);
154 terminate();
155 }
156});