***

## title: 'Example: Full-cascade incl. Groq'

## Overview

This full-cascade example demonstrates:

1. A full cascade Voice AI pipeline with independent Speech-to-Text (STT), Large Language Model (LLM), and Text-to-Speech (TTS) components
2. 3rd Party LLM use using OpenAI Compatibility mode in the VoxEngine OpenAI module
3. Turn taking with barge-in and end-of-turn detection to keep interactions natural and responsive

This specific example uses Deepgram for STT with custom vocabulary,
Groq's OpenAI-compatible Responses API using `llama-3.3-70b-versatile` for the LLM,
and Inworld for low-latency, streaming TTS.
These can be changed out for any supported VoxEngine modules or external APIs as needed.

**⬇️ Jump to the [Full VoxEngine scenario](#full-voxengine-scenario).**

## Prerequisites

* Store your Groq API key in Voximplant `ApplicationStorage` under `GROQ_API_KEY`.
* Include `vox-turn-taking` before this scenario in the same routing rule sequence. Code for the turn-taking helper is available at [Turn Taking Helper Code](/capabilities/speech-flow-control/turn-taking-helper-library#turn-taking-helper-code).

## How it works

* Deepgram transcribes caller audio with interim and final transcripts.
* `VoxTurnTaking` runs Silero VAD and Pipecat Smart Turn-style detection to decide when a user turn is ready.
* The scenario sends completed user turns to Groq through `OpenAI.createResponsesAPIClient({ baseUrl: "https://api.groq.com/openai/v1" })`.
* Response text deltas are streamed into Inworld TTS and played back into the call.

![Full-cascade flow illustration](https://files.buildwithfern.com/voximplant.docs.buildwithfern.com/c398812b3547012d08dd87986ae1a965c32aebabbf04620fd1bb50f9e8effb6d/docs/assets/features/turn-detection-hero-desktop.svg)

## Notes

* This example uses an **OpenAI-compatible API**, not OpenAI's own hosted Responses API. The VoxEngine OpenAI module still works because the Groq endpoint follows the same request and event model closely enough for this flow.
* Groq's current Responses API support is still limited relative to OpenAI's full stored-context flow. In practice, you should not assume support for features such as `previous_response_id` or `storeContext`. This example keeps each turn independent to stay simple and predictable.
* If you need multi-turn memory with Groq, manage conversation history locally and resend the full structured input on each request.
* The included prompt is intentionally short for example readability. Text-expecting models such as Llama usually behave better with a more explicit system prompt that tightly defines tone, grounding, brevity, ambiguity handling, and how to respond to partial caller fragments.
* The turn-taking behavior in this example depends on the [Turn Taking Helper Library](/capabilities/speech-flow-control/turn-taking-helper-library#turn-taking-helper-code). For details on turn taking parameters, see [Turn Taking Helper Library Guide](/capabilities/speech-flow-control/turn-taking-helper-library).

## More info

* OpenAI module API: [https://voximplant.com/docs/references/voxengine/openai](https://voximplant.com/docs/references/voxengine/openai)
* Silero module API: [https://voximplant.com/docs/references/voxengine/silero](https://voximplant.com/docs/references/voxengine/silero)
* Pipecat module API: [https://voximplant.com/docs/references/voxengine/pipecat](https://voximplant.com/docs/references/voxengine/pipecat)
* Inworld module API: [https://voximplant.com/docs/references/voxengine/inworld](https://voximplant.com/docs/references/voxengine/inworld)
* Deepgram ASR profile guide: [https://voximplant.com/docs/guides/speech/asr](https://voximplant.com/docs/guides/speech/asr)

## Full VoxEngine scenario

```javascript title={"voxeengine-full-cascade-dg-groq-iw.js"} maxLines={0}
/**
 * Full-cascade Voice AI demo: Deepgram STT + Groq Llama Responses API + Inworld TTS
 * Scenario: answer an incoming call using VoxTurnTaking for turn management.
 *
 * Include `vox-turn-taking` BEFORE this scenario in the routing rule sequence.
 *
 * Groq's Responses API is OpenAI-compatible, but it does not currently support
 * `previous_response_id`. To keep this example simple, each turn is submitted
 * independently instead of rebuilding prior conversation history locally.
 */

require(Modules.ASR);
require(Modules.OpenAI);
require(Modules.Inworld);
require(Modules.ApplicationStorage);

const SYSTEM_PROMPT = `
You are Voxi, a helpful phone assistant for Voximplant. Keep responses short, polite, and telephony-friendly (usually 1-2 sentences).
Reply in English.
`;

VoxEngine.addEventListener(AppEvents.CallAlerting, async ({call}) => {
    let stt;
    let responsesClient;
    let ttsPlayer;
    let turnTaking;
    const terminate = () => {
        stt?.stop();
        responsesClient?.close();
        turnTaking?.close();
        VoxEngine.terminate();
    };

    call.addEventListener(CallEvents.Disconnected, terminate);
    call.addEventListener(CallEvents.Failed, terminate);

    try {
        call.answer();
        call.record({hd_audio: true, stereo: true});        // optional recording

        stt = VoxEngine.createASR({
            profile: ASRProfileList.Deepgram.en_US,
            interimResults: true,
            request: {
                language: "en-US",
                model: "nova-2-phonecall",
                keywords: ["Voximplant:4", "OpenAI:2"],
            },
        });

        responsesClient = await OpenAI.createResponsesAPIClient({
            apiKey: (await ApplicationStorage.get("GROQ_API_KEY")).value,
            baseUrl: "https://api.groq.com/openai/v1",
            storeContext: false,
            onWebSocketClose: (event) => {
                Logger.write("===Groq.WebSocket.Close===");
                if (event) Logger.write(JSON.stringify(event));
                terminate();
            },
        });

        ttsPlayer = Inworld.createRealtimeTTSPlayer({
            createContextParameters: {
                create: {
                    voiceId: "Ashley",
                    modelId: "inworld-tts-1.5-mini",
                    speakingRate: 1.1,
                    temperature: 1.3,
                }
            }
        });

        // Load the VoxTurnTaking module as part of the routing rule
        turnTaking = await VoxTurnTaking.create({
            call,
            stt,
            vadOptions: {
                threshold: 0.5,                             // sensitivity for detecting speech vs silence
                minSilenceDurationMs: 350,                  // silence required before VAD marks speech end
                speechPadMs: 10,                            // small padding around detected speech
            },
            turnDetectorOptions: {
                threshold: 0.5,                             // end-of-turn probability needed from Pipecat
            },
            policy: {
                transcriptSettleMs: 500,                    // grace period for a final STT chunk after end-of-turn
                userSpeechTimeoutMs: 1000,                  // default fallback submit timeout after speech ends
                shortUtteranceExtensionMs: 1800,            // longer hold for fragments that may continue
                fastShortUtteranceTimeoutMs: 700,           // faster submit for short complete utterances like "hey"
                shortUtteranceMaxChars: 12,                 // max chars still treated as a short fragment
                shortUtteranceMaxWords: 2,                  // max words still treated as a short fragment
                lowConfidenceShortUtteranceThreshold: 0.75, // keep short low-confidence finals replaceable
            },
            enableLogging: true,
            onUserTurn: (input) => {                    // send the transcript text on end-of-turn
                responsesClient.createResponses({
                    model: "llama-3.3-70b-versatile",
                    instructions: SYSTEM_PROMPT,
                    input,
                });
            },
            onInterrupt: () => {
                ttsPlayer?.clearBuffer();                       // stop any in-progress TTS audio
            },
        });

        responsesClient.addEventListener(OpenAI.ResponsesAPIEvents.ResponseTextDelta, (event) => {
            const text = event?.data?.payload?.delta;
            if (!text || !turnTaking.canPlayAgentAudio()) return;
            ttsPlayer.send({send_text: {text}});
        });

        responsesClient.addEventListener(OpenAI.ResponsesAPIEvents.ResponseTextDone, (event) => {
            const text = event?.data?.payload?.text;
            Logger.write(`===AGENT=== ${text}`);
            ttsPlayer.send({flush_context: {}});        // Tell TTS to process all buffered text immediately
        });

        // Event logging to illustrate available OpenAI Responses API client events
        [
            OpenAI.ResponsesAPIEvents.ResponseCreated,
            OpenAI.ResponsesAPIEvents.ResponseFailed,
            OpenAI.ResponsesAPIEvents.ResponsesAPIError,
            OpenAI.ResponsesAPIEvents.ResponseInProgress,
            OpenAI.ResponsesAPIEvents.ResponseCompleted,
            OpenAI.ResponsesAPIEvents.ResponseOutputItemAdded,
            OpenAI.ResponsesAPIEvents.ResponseContentPartAdded,
            OpenAI.ResponsesAPIEvents.ConnectorInformation,
            OpenAI.ResponsesAPIEvents.Unknown,
            OpenAI.Events.WebSocketMediaStarted,
            OpenAI.Events.WebSocketMediaEnded,
        ].forEach((eventName) => {
            responsesClient.addEventListener(eventName, (event) => {
                Logger.write(`===${event?.name || eventName}===`);
                if (event?.data) Logger.write(JSON.stringify(event.data));
            });
        });

        // Attach the caller media
        call.sendMediaTo(stt);
        ttsPlayer.sendMediaTo(call);

        // Tell the LLM to talk first and greet the user
        responsesClient.createResponses({
            model: "llama-3.3-70b-versatile",
            instructions: SYSTEM_PROMPT,
            input: "Greet the caller briefly.",
        });


    } catch (error) {
        Logger.write("===UNHANDLED_ERROR===");
        Logger.write(error);
        terminate();
    }
});

```