Example: Full-cascade incl. Groq
Overview
This full-cascade example demonstrates:
- A full cascade Voice AI pipeline with independent Speech-to-Text (STT), Large Language Model (LLM), and Text-to-Speech (TTS) components
- 3rd Party LLM use using OpenAI Compatibility mode in the VoxEngine OpenAI module
- Turn taking with barge-in and end-of-turn detection to keep interactions natural and responsive
This specific example uses Deepgram for STT with custom vocabulary,
Groq’s OpenAI-compatible Responses API using llama-3.3-70b-versatile for the LLM,
and Inworld for low-latency, streaming TTS.
These can be changed out for any supported VoxEngine modules or external APIs as needed.
⬇️ Jump to the Full VoxEngine scenario.
Prerequisites
- Store your Groq API key in Voximplant
ApplicationStorageunderGROQ_API_KEY. - Include
vox-turn-takingbefore this scenario in the same routing rule sequence. Code for the turn-taking helper is available at Turn Taking Helper Code.
How it works
- Deepgram transcribes caller audio with interim and final transcripts.
VoxTurnTakingruns Silero VAD and Pipecat Smart Turn-style detection to decide when a user turn is ready.- The scenario sends completed user turns to Groq through
OpenAI.createResponsesAPIClient({ baseUrl: "https://api.groq.com/openai/v1" }). - Response text deltas are streamed into Inworld TTS and played back into the call.
Notes
- This example uses an OpenAI-compatible API, not OpenAI’s own hosted Responses API. The VoxEngine OpenAI module still works because the Groq endpoint follows the same request and event model closely enough for this flow.
- Groq’s current Responses API support is still limited relative to OpenAI’s full stored-context flow. In practice, you should not assume support for features such as
previous_response_idorstoreContext. This example keeps each turn independent to stay simple and predictable. - If you need multi-turn memory with Groq, manage conversation history locally and resend the full structured input on each request.
- The included prompt is intentionally short for example readability. Text-expecting models such as Llama usually behave better with a more explicit system prompt that tightly defines tone, grounding, brevity, ambiguity handling, and how to respond to partial caller fragments.
- The turn-taking behavior in this example depends on the Turn Taking Helper Library. For details on turn taking parameters, see Turn Taking Helper Library Guide.
More info
- OpenAI module API: https://voximplant.com/docs/references/voxengine/openai
- Silero module API: https://voximplant.com/docs/references/voxengine/silero
- Pipecat module API: https://voximplant.com/docs/references/voxengine/pipecat
- Inworld module API: https://voximplant.com/docs/references/voxengine/inworld
- Deepgram ASR profile guide: https://voximplant.com/docs/guides/speech/asr
Full VoxEngine scenario
voxeengine-full-cascade-dg-groq-iw.js