Example: Full-cascade incl. Groq

View as Markdown

For the complete documentation index, see llms.txt.

Overview

This full-cascade example demonstrates:

  1. A full cascade Voice AI pipeline with independent Speech-to-Text (STT), Large Language Model (LLM), and Text-to-Speech (TTS) components
  2. 3rd Party LLM use using OpenAI Compatibility mode in the VoxEngine OpenAI module
  3. Turn taking with barge-in and end-of-turn detection to keep interactions natural and responsive

This specific example uses Deepgram for STT with custom vocabulary, Groq’s OpenAI-compatible Responses API using llama-3.3-70b-versatile for the LLM, and Inworld for low-latency, streaming TTS. These can be changed out for any supported VoxEngine modules or external APIs as needed.

⬇️ Jump to the Full VoxEngine scenario.

Prerequisites

  • Store your Groq API key in Voximplant Secrets under GROQ_API_KEY.
  • Include vox-turn-taking before this scenario in the same routing rule sequence. Code for the turn-taking helper is available at Turn Taking Helper Code.

How it works

  • Deepgram transcribes caller audio with interim and final transcripts.
  • VoxTurnTaking runs Silero VAD and Pipecat Smart Turn-style detection to decide when a user turn is ready.
  • The scenario sends completed user turns to Groq through OpenAI.createResponsesAPIClient({ baseUrl: "https://api.groq.com/openai/v1" }).
  • Response text deltas are streamed into Inworld TTS and played back into the call.

Full-cascade flow illustration

Notes

  • This example uses an OpenAI-compatible API, not OpenAI’s own hosted Responses API. The VoxEngine OpenAI module still works because the Groq endpoint follows the same request and event model closely enough for this flow.
  • Groq’s current Responses API support is still limited relative to OpenAI’s full stored-context flow. In practice, you should not assume support for features such as previous_response_id or storeContext. This example keeps each turn independent to stay simple and predictable.
  • If you need multi-turn memory with Groq, manage conversation history locally and resend the full structured input on each request.
  • The included prompt is intentionally short for example readability. Text-expecting models such as Llama usually behave better with a more explicit system prompt that tightly defines tone, grounding, brevity, ambiguity handling, and how to respond to partial caller fragments.
  • The turn-taking behavior in this example depends on the Turn Taking Helper Library. For details on turn taking parameters, see Turn Taking Helper Library Guide.

More info

Full VoxEngine scenario

voxeengine-full-cascade-dg-groq-iw.js
1/**
2 * Full-cascade Voice AI demo: Deepgram STT + Groq Llama Responses API + Inworld TTS
3 * Scenario: answer an incoming call using VoxTurnTaking for turn management.
4 *
5 * Include `vox-turn-taking` BEFORE this scenario in the routing rule sequence.
6 *
7 * Groq's Responses API is OpenAI-compatible, but it does not currently support
8 * `previous_response_id`. To keep this example simple, each turn is submitted
9 * independently instead of rebuilding prior conversation history locally.
10 */
11
12require(Modules.ASR);
13require(Modules.OpenAI);
14require(Modules.Inworld);
15const SYSTEM_PROMPT = `
16You are Voxi, a helpful phone assistant for Voximplant. Keep responses short, polite, and telephony-friendly (usually 1-2 sentences).
17Reply in English.
18`;
19
20VoxEngine.addEventListener(AppEvents.CallAlerting, async ({call}) => {
21 let stt;
22 let responsesClient;
23 let ttsPlayer;
24 let turnTaking;
25 const terminate = () => {
26 stt?.stop();
27 responsesClient?.close();
28 turnTaking?.close();
29 VoxEngine.terminate();
30 };
31
32 call.addEventListener(CallEvents.Disconnected, terminate);
33 call.addEventListener(CallEvents.Failed, terminate);
34
35 try {
36 call.answer();
37 call.record({hd_audio: true, stereo: true}); // optional recording
38
39 stt = VoxEngine.createASR({
40 profile: ASRProfileList.Deepgram.en_US,
41 interimResults: true,
42 request: {
43 language: "en-US",
44 model: "nova-2-phonecall",
45 keywords: ["Voximplant:4", "OpenAI:2"],
46 },
47 });
48
49 responsesClient = await OpenAI.createResponsesAPIClient({
50 apiKey: VoxEngine.getSecretValue('GROQ_API_KEY'),
51 baseUrl: "https://api.groq.com/openai/v1",
52 storeContext: false,
53 onWebSocketClose: (event) => {
54 Logger.write("===Groq.WebSocket.Close===");
55 if (event) Logger.write(JSON.stringify(event));
56 terminate();
57 },
58 });
59
60 ttsPlayer = Inworld.createRealtimeTTSPlayer({
61 createContextParameters: {
62 create: {
63 voiceId: "Ashley",
64 modelId: "inworld-tts-1.5-mini",
65 speakingRate: 1.1,
66 temperature: 1.3,
67 }
68 }
69 });
70
71 // Load the VoxTurnTaking module as part of the routing rule
72 turnTaking = await VoxTurnTaking.create({
73 call,
74 stt,
75 vadOptions: {
76 threshold: 0.5, // sensitivity for detecting speech vs silence
77 minSilenceDurationMs: 350, // silence required before VAD marks speech end
78 speechPadMs: 10, // small padding around detected speech
79 },
80 turnDetectorOptions: {
81 threshold: 0.5, // end-of-turn probability needed from Pipecat
82 },
83 policy: {
84 transcriptSettleMs: 500, // grace period for a final STT chunk after end-of-turn
85 userSpeechTimeoutMs: 1000, // default fallback submit timeout after speech ends
86 shortUtteranceExtensionMs: 1800, // longer hold for fragments that may continue
87 fastShortUtteranceTimeoutMs: 700, // faster submit for short complete utterances like "hey"
88 shortUtteranceMaxChars: 12, // max chars still treated as a short fragment
89 shortUtteranceMaxWords: 2, // max words still treated as a short fragment
90 lowConfidenceShortUtteranceThreshold: 0.75, // keep short low-confidence finals replaceable
91 },
92 enableLogging: true,
93 onUserTurn: (input) => { // send the transcript text on end-of-turn
94 responsesClient.createResponses({
95 model: "llama-3.3-70b-versatile",
96 instructions: SYSTEM_PROMPT,
97 input,
98 });
99 },
100 onInterrupt: () => {
101 ttsPlayer?.clearBuffer(); // stop any in-progress TTS audio
102 },
103 });
104
105 responsesClient.addEventListener(OpenAI.ResponsesAPIEvents.ResponseTextDelta, (event) => {
106 const text = event?.data?.payload?.delta;
107 if (!text || !turnTaking.canPlayAgentAudio()) return;
108 ttsPlayer.send({send_text: {text}});
109 });
110
111 responsesClient.addEventListener(OpenAI.ResponsesAPIEvents.ResponseTextDone, (event) => {
112 const text = event?.data?.payload?.text;
113 Logger.write(`===AGENT=== ${text}`);
114 ttsPlayer.send({flush_context: {}}); // Tell TTS to process all buffered text immediately
115 });
116
117 // Event logging to illustrate available OpenAI Responses API client events
118 [
119 OpenAI.ResponsesAPIEvents.ResponseCreated,
120 OpenAI.ResponsesAPIEvents.ResponseFailed,
121 OpenAI.ResponsesAPIEvents.ResponsesAPIError,
122 OpenAI.ResponsesAPIEvents.ResponseInProgress,
123 OpenAI.ResponsesAPIEvents.ResponseCompleted,
124 OpenAI.ResponsesAPIEvents.ResponseOutputItemAdded,
125 OpenAI.ResponsesAPIEvents.ResponseContentPartAdded,
126 OpenAI.ResponsesAPIEvents.ConnectorInformation,
127 OpenAI.ResponsesAPIEvents.Unknown,
128 OpenAI.Events.WebSocketMediaStarted,
129 OpenAI.Events.WebSocketMediaEnded,
130 ].forEach((eventName) => {
131 responsesClient.addEventListener(eventName, (event) => {
132 Logger.write(`===${event?.name || eventName}===`);
133 if (event?.data) Logger.write(JSON.stringify(event.data));
134 });
135 });
136
137 // Attach the caller media
138 call.sendMediaTo(stt);
139 ttsPlayer.sendMediaTo(call);
140
141 // Tell the LLM to talk first and greet the user
142 responsesClient.createResponses({
143 model: "llama-3.3-70b-versatile",
144 instructions: SYSTEM_PROMPT,
145 input: "Greet the caller briefly.",
146 });
147
148
149 } catch (error) {
150 Logger.write("===UNHANDLED_ERROR===");
151 Logger.write(error);
152 terminate();
153 }
154});