*** ## title: 'Example: Full-cascade incl. Groq' ## Overview This full-cascade example demonstrates: 1. A full cascade Voice AI pipeline with independent Speech-to-Text (STT), Large Language Model (LLM), and Text-to-Speech (TTS) components 2. 3rd Party LLM use using OpenAI Compatibility mode in the VoxEngine OpenAI module 3. Turn taking with barge-in and end-of-turn detection to keep interactions natural and responsive This specific example uses Deepgram for STT with custom vocabulary, Groq's OpenAI-compatible Responses API using `llama-3.3-70b-versatile` for the LLM, and Inworld for low-latency, streaming TTS. These can be changed out for any supported VoxEngine modules or external APIs as needed. **⬇️ Jump to the [Full VoxEngine scenario](#full-voxengine-scenario).** ## Prerequisites * Store your Groq API key in Voximplant `ApplicationStorage` under `GROQ_API_KEY`. * Include `vox-turn-taking` before this scenario in the same routing rule sequence. Code for the turn-taking helper is available at [Turn Taking Helper Code](/capabilities/speech-flow-control/turn-taking-helper-library#turn-taking-helper-code). ## How it works * Deepgram transcribes caller audio with interim and final transcripts. * `VoxTurnTaking` runs Silero VAD and Pipecat Smart Turn-style detection to decide when a user turn is ready. * The scenario sends completed user turns to Groq through `OpenAI.createResponsesAPIClient({ baseUrl: "https://api.groq.com/openai/v1" })`. * Response text deltas are streamed into Inworld TTS and played back into the call. ![Full-cascade flow illustration](https://files.buildwithfern.com/voximplant.docs.buildwithfern.com/c398812b3547012d08dd87986ae1a965c32aebabbf04620fd1bb50f9e8effb6d/docs/assets/features/turn-detection-hero-desktop.svg) ## Notes * This example uses an **OpenAI-compatible API**, not OpenAI's own hosted Responses API. The VoxEngine OpenAI module still works because the Groq endpoint follows the same request and event model closely enough for this flow. * Groq's current Responses API support is still limited relative to OpenAI's full stored-context flow. In practice, you should not assume support for features such as `previous_response_id` or `storeContext`. This example keeps each turn independent to stay simple and predictable. * If you need multi-turn memory with Groq, manage conversation history locally and resend the full structured input on each request. * The included prompt is intentionally short for example readability. Text-expecting models such as Llama usually behave better with a more explicit system prompt that tightly defines tone, grounding, brevity, ambiguity handling, and how to respond to partial caller fragments. * The turn-taking behavior in this example depends on the [Turn Taking Helper Library](/capabilities/speech-flow-control/turn-taking-helper-library#turn-taking-helper-code). For details on turn taking parameters, see [Turn Taking Helper Library Guide](/capabilities/speech-flow-control/turn-taking-helper-library). ## More info * OpenAI module API: [https://voximplant.com/docs/references/voxengine/openai](https://voximplant.com/docs/references/voxengine/openai) * Silero module API: [https://voximplant.com/docs/references/voxengine/silero](https://voximplant.com/docs/references/voxengine/silero) * Pipecat module API: [https://voximplant.com/docs/references/voxengine/pipecat](https://voximplant.com/docs/references/voxengine/pipecat) * Inworld module API: [https://voximplant.com/docs/references/voxengine/inworld](https://voximplant.com/docs/references/voxengine/inworld) * Deepgram ASR profile guide: [https://voximplant.com/docs/guides/speech/asr](https://voximplant.com/docs/guides/speech/asr) ## Full VoxEngine scenario ```javascript title={"voxeengine-full-cascade-dg-groq-iw.js"} maxLines={0} /** * Full-cascade Voice AI demo: Deepgram STT + Groq Llama Responses API + Inworld TTS * Scenario: answer an incoming call using VoxTurnTaking for turn management. * * Include `vox-turn-taking` BEFORE this scenario in the routing rule sequence. * * Groq's Responses API is OpenAI-compatible, but it does not currently support * `previous_response_id`. To keep this example simple, each turn is submitted * independently instead of rebuilding prior conversation history locally. */ require(Modules.ASR); require(Modules.OpenAI); require(Modules.Inworld); require(Modules.ApplicationStorage); const SYSTEM_PROMPT = ` You are Voxi, a helpful phone assistant for Voximplant. Keep responses short, polite, and telephony-friendly (usually 1-2 sentences). Reply in English. `; VoxEngine.addEventListener(AppEvents.CallAlerting, async ({call}) => { let stt; let responsesClient; let ttsPlayer; let turnTaking; const terminate = () => { stt?.stop(); responsesClient?.close(); turnTaking?.close(); VoxEngine.terminate(); }; call.addEventListener(CallEvents.Disconnected, terminate); call.addEventListener(CallEvents.Failed, terminate); try { call.answer(); call.record({hd_audio: true, stereo: true}); // optional recording stt = VoxEngine.createASR({ profile: ASRProfileList.Deepgram.en_US, interimResults: true, request: { language: "en-US", model: "nova-2-phonecall", keywords: ["Voximplant:4", "OpenAI:2"], }, }); responsesClient = await OpenAI.createResponsesAPIClient({ apiKey: (await ApplicationStorage.get("GROQ_API_KEY")).value, baseUrl: "https://api.groq.com/openai/v1", storeContext: false, onWebSocketClose: (event) => { Logger.write("===Groq.WebSocket.Close==="); if (event) Logger.write(JSON.stringify(event)); terminate(); }, }); ttsPlayer = Inworld.createRealtimeTTSPlayer({ createContextParameters: { create: { voiceId: "Ashley", modelId: "inworld-tts-1.5-mini", speakingRate: 1.1, temperature: 1.3, } } }); // Load the VoxTurnTaking module as part of the routing rule turnTaking = await VoxTurnTaking.create({ call, stt, vadOptions: { threshold: 0.5, // sensitivity for detecting speech vs silence minSilenceDurationMs: 350, // silence required before VAD marks speech end speechPadMs: 10, // small padding around detected speech }, turnDetectorOptions: { threshold: 0.5, // end-of-turn probability needed from Pipecat }, policy: { transcriptSettleMs: 500, // grace period for a final STT chunk after end-of-turn userSpeechTimeoutMs: 1000, // default fallback submit timeout after speech ends shortUtteranceExtensionMs: 1800, // longer hold for fragments that may continue fastShortUtteranceTimeoutMs: 700, // faster submit for short complete utterances like "hey" shortUtteranceMaxChars: 12, // max chars still treated as a short fragment shortUtteranceMaxWords: 2, // max words still treated as a short fragment lowConfidenceShortUtteranceThreshold: 0.75, // keep short low-confidence finals replaceable }, enableLogging: true, onUserTurn: (input) => { // send the transcript text on end-of-turn responsesClient.createResponses({ model: "llama-3.3-70b-versatile", instructions: SYSTEM_PROMPT, input, }); }, onInterrupt: () => { ttsPlayer?.clearBuffer(); // stop any in-progress TTS audio }, }); responsesClient.addEventListener(OpenAI.ResponsesAPIEvents.ResponseTextDelta, (event) => { const text = event?.data?.payload?.delta; if (!text || !turnTaking.canPlayAgentAudio()) return; ttsPlayer.send({send_text: {text}}); }); responsesClient.addEventListener(OpenAI.ResponsesAPIEvents.ResponseTextDone, (event) => { const text = event?.data?.payload?.text; Logger.write(`===AGENT=== ${text}`); ttsPlayer.send({flush_context: {}}); // Tell TTS to process all buffered text immediately }); // Event logging to illustrate available OpenAI Responses API client events [ OpenAI.ResponsesAPIEvents.ResponseCreated, OpenAI.ResponsesAPIEvents.ResponseFailed, OpenAI.ResponsesAPIEvents.ResponsesAPIError, OpenAI.ResponsesAPIEvents.ResponseInProgress, OpenAI.ResponsesAPIEvents.ResponseCompleted, OpenAI.ResponsesAPIEvents.ResponseOutputItemAdded, OpenAI.ResponsesAPIEvents.ResponseContentPartAdded, OpenAI.ResponsesAPIEvents.ConnectorInformation, OpenAI.ResponsesAPIEvents.Unknown, OpenAI.Events.WebSocketMediaStarted, OpenAI.Events.WebSocketMediaEnded, ].forEach((eventName) => { responsesClient.addEventListener(eventName, (event) => { Logger.write(`===${event?.name || eventName}===`); if (event?.data) Logger.write(JSON.stringify(event.data)); }); }); // Attach the caller media call.sendMediaTo(stt); ttsPlayer.sendMediaTo(call); // Tell the LLM to talk first and greet the user responsesClient.createResponses({ model: "llama-3.3-70b-versatile", instructions: SYSTEM_PROMPT, input: "Greet the caller briefly.", }); } catch (error) { Logger.write("===UNHANDLED_ERROR==="); Logger.write(error); terminate(); } }); ```