Grok

Voice agent API client for xAI Grok real-time voice scenarios.

View as Markdown

Grok provides a VoxEngine client for connecting a call or media unit to the xAI Voice Agent API over WebSocket.

Use Grok.createVoiceAgentAPIClient(...) to create a VoiceAgentAPIClient for the current scenario. After the client is created, call methods such as sendMediaTo, responseCreate, and addEventListener on that client instance.

Contents

  • Usage: required module import and basic flow.
  • Factory functions: create the Grok client.
  • VoiceAgentAPIClientParameters: API key, tracing, privacy, and WebSocket options.
  • VoiceAgentAPIClient: runtime client object returned by the factory.
  • Methods: media, response, and connection control methods.
  • Events: WebSocket media start and end payloads.
  • VoiceAgentAPIEvents: Grok Voice Agent API event names and payload fields.

Usage

Add the module before using the namespace:

1require(Modules.Grok);

Create the client, bridge media, and listen for both WebSocket media events and Voice Agent API events.

Factory functions

createVoiceAgentAPIClient

Creates a new Grok.VoiceAgentAPIClient instance.

1createVoiceAgentAPIClient(parameters: {
2 statistics?: boolean;
3 trace?: boolean;
4 privacy?: boolean;
5 onWebSocketClose?: (event: object) => void;
6 xAIApiKey: string;
7 model?: string;
8}): Promise<Grok.VoiceAgentAPIClient>

The required parameters object is typed as Grok.VoiceAgentAPIClientParameters.

Parameters

ParameterTypeReq.Description
parametersVoiceAgentAPIClientParametersGrok.VoiceAgentAPIClient parameters. Can be passed as arguments to the Grok.createVoiceAgentAPIClient method.
statisticsbooleanEnables statistics functionality.
tracebooleanWhether to enable the tracing functionality. If tracing is enabled, a URL to the trace file appears in the ‘websocket.created’ message. The file contains all sent and received WebSocket messages in the plain text format. The file is uploaded to the S3 storage. NOTE: enable this only for diagnostic purposes. You can provide the trace file to our support team to help investigating issues.
privacybooleanWhether to enable the privacy functionality. If privacy is enabled, the logging for the WebSocket connection is disabled. NOTE: the default value is false.
onWebSocketClose(event: object) => voidA callback function that is called when the WebSocket connection is closed.
xAIApiKeystringThe xAI API key for the Grok VoiceAgent API.
modelstringThe model to use for the Grok VoiceAgent API.https://docs.x.ai/developers/model-capabilities/audio/voice-agent#model-selection Note: The default value is grok-voice-fast-1.0.

Returns

TypeDescription
Promise<Grok.VoiceAgentAPIClient>Resolves to the Grok.VoiceAgentAPIClient instance.

VoiceAgentAPIClient

Methods

addEventListener

Adds a handler for the specified Grok.VoiceAgentAPIEvents or Grok.Events event. Use only functions as handlers; anything except a function leads to the error and scenario termination when a handler is called.

1addEventListener(event: Grok.Events | Grok.VoiceAgentAPIEvents | string, callback: (event: object) => any): void

Parameters

ParameterTypeReq.Description
eventGrok.Events | Grok.VoiceAgentAPIEvents | stringEvent constant or event name to subscribe to.
callback(event: object) => anyFunction called when the event is emitted.

Returns

TypeDescription
voidDoes not return a value.

clearMediaBuffer

Clears the Grok WebSocket media buffer.

1clearMediaBuffer(parameters?: ClearMediaBufferParameters): void

Parameters

ParameterTypeReq.Description
parametersClearMediaBufferParameters

Returns

TypeDescription
voidDoes not return a value.

close

Closes the Grok connection (over WebSocket) or connection attempt.

1close(): void

Parameters

This method does not accept parameters.

Returns

TypeDescription
voidDoes not return a value.

conversationItemCreate

Create a new user message. https://docs.x.ai/docs/guides/voice/agent#client

1conversationItemCreate(parameters: Object): void

Parameters

ParameterTypeReq.Description
parametersObjectxAI conversation.item.create client message. Common fields include type, event_id, and item; item can be a user message, assistant message, function call, or function call output. See the partner API reference.

Returns

TypeDescription
voidDoes not return a value.

Example parameters:

1{
2 "type": "conversation.item.create",
3 "event_id": "event_345",
4 "item": {
5 "type": "message",
6 "role": "user",
7 "content": [
8 {
9 "type": "input_text",
10 "text": "Hello"
11 }
12 ]
13 }
14}

id

Returns the VoiceAgentAPIClient id.

1id(): string

Parameters

This method does not accept parameters.

Returns

TypeDescription
stringThe requested string value.

inputAudioBufferClear

Clear input audio buffer. https://docs.x.ai/docs/guides/voice/agent#client-1

1inputAudioBufferClear(parameters: Object): void

Parameters

ParameterTypeReq.Description
parametersObjectxAI input_audio_buffer.clear client message. Common fields include type and optional event_id. See the partner API reference.

Returns

TypeDescription
voidDoes not return a value.

Example parameters:

1{
2 "type": "input_audio_buffer.clear"
3}

removeEventListener

Removes a handler for the specified Grok.VoiceAgentAPIEvents or Grok.Events event.

1removeEventListener(event: Grok.Events | Grok.VoiceAgentAPIEvents | string, callback?: (event: object) => any): void

Parameters

ParameterTypeReq.Description
eventGrok.Events | Grok.VoiceAgentAPIEvents | stringEvent constant or event name to subscribe to.
callback(event: object) => anyFunction called when the event is emitted.

Returns

TypeDescription
voidDoes not return a value.

responseCreate

Request the server to create a new assistant response when using client side vad. (This is handled automatically when using server side vad.) https://docs.x.ai/docs/guides/voice/agent#client-2

1responseCreate(parameters: Object): void

Parameters

ParameterTypeReq.Description
parametersObjectxAI response.create client message. Common fields include type, optional event_id, and optional response configuration. See the partner API reference.

Returns

TypeDescription
voidDoes not return a value.

Example parameters:

1{
2 "type": "response.create"
3}

sendMediaTo

Starts sending media from the Grok (via WebSocket) to the media unit. Grok works in real time.

1sendMediaTo(mediaUnit: VoxMediaUnit, parameters?: SendMediaParameters): void

Parameters

ParameterTypeReq.Description
mediaUnitVoxMediaUnit
parametersSendMediaParameters

Returns

TypeDescription
voidDoes not return a value.

sessionUpdate

Send this event to update the session’s configuration. https://docs.x.ai/docs/guides/voice/agent#client-events-1

1sessionUpdate(parameters: Object): void

Parameters

ParameterTypeReq.Description
parametersObjectxAI session.update client message. Common fields include type, optional event_id, and session, which can configure prompt, voice, audio formats, turn detection, and tools. See the partner API reference.

Returns

TypeDescription
voidDoes not return a value.

Example parameters:

1{
2 "type": "session.update",
3 "session": {
4 "voice": "aria",
5 "instructions": "You are a helpful assistant",
6 "turn_detection": {
7 "type": "server_vad"
8 }
9 }
10}

stopMediaTo

Stops sending media from the Grok (via WebSocket) to the media unit.

1stopMediaTo(mediaUnit: VoxMediaUnit): void

Parameters

ParameterTypeReq.Description
mediaUnitVoxMediaUnit

Returns

TypeDescription
voidDoes not return a value.

webSocketId

Returns the Grok WebSocket id.

1webSocketId(): string

Parameters

This method does not accept parameters.

Returns

TypeDescription
stringThe requested string value.

Events

These events describe audio received through the Grok WebSocket media bridge.

WebSocketMediaStarted

Triggered when the audio stream sent by a third party through an Grok WebSocket is started playing.

Event constant: Events.WebSocketMediaStarted

Payload

FieldTypeReq.Description
clientVoiceAgentAPIClientThe Grok.VoiceAgentAPIClient instance.
tagstringSpecial tag to name audio streams sent over one WebSocket connection. With it, one can send 2 audios to 2 different media units at the same time.
encodingstringAudio encoding formats.
customParameters{ [key: string]: string }Custom parameters.

WebSocketMediaEnded

Triggers after the end of the audio stream sent by a third party through an Grok WebSocket (1 second of silence).

Event constant: Events.WebSocketMediaEnded

Payload

FieldTypeReq.Description
clientVoiceAgentAPIClientThe Grok.VoiceAgentAPIClient instance.
tagstringSpecial tag to name audio streams sent over one WebSocket connection. With it, one can send 2 audios to 2 different media units at the same time.
mediaInfoWebSocketMediaInfoInformation about the audio stream that can be obtained after the stream stops or pauses (1 second of silence).

VoiceAgentAPIEvents

These events mirror server messages from the Grok Voice Agent API. The data field contains the provider event payload.

All VoiceAgentAPIEvents callbacks receive these common fields:

FieldTypeDescription
clientGrok.VoiceAgentAPIClientThe Grok.VoiceAgentAPIClient instance.
dataObjectPass-through xAI server event payload.

Per-event payload tables below show only event-specific fields (and any provider payload enrichment for data).

The unknown event.

Event constant: VoiceAgentAPIEvents.Unknown

Payload

No event-specific payload columns are listed here; this callback still receives the common client and data fields. For data, see the partner documentation for the exact JSON shape.

The first message at connection. Notifies the client that a conversation session has been created. https://docs.x.ai/docs/guides/voice/agent#server-events-2

Event constant: VoiceAgentAPIEvents.ConversationCreated

Payload

FieldTypeReq.Description
dataObjectThe first message on connection. Notifies the client that a conversation session has been created. See the partner event documentation.
data.event_idstringUnique event identifier.
data.typestringAlways conversation.created.
data.conversationobjectThe conversation object.

Example data:

1{
2 "event_id": "event_9101",
3 "type": "conversation.created",
4 "conversation": {
5 "id": "conv_001",
6 "object": "realtime.conversation"
7 }
8}

Acknowledge the client’s “session.update” message that the session has been updated. https://docs.x.ai/docs/guides/voice/agent#server-events-1

Event constant: VoiceAgentAPIEvents.SessionUpdated

Payload

FieldTypeReq.Description
dataObjectAcknowledges the client’s session.update message that the session has been configured. See the partner event documentation.
data.event_idstringUnique event identifier.
data.typestringAlways session.updated.
data.sessionobjectThe updated session configuration.

Example data:

1{
2 "event_id": "event_123",
3 "type": "session.updated",
4 "session": {
5 "model": "grok-voice-fast-1.0",
6 "instructions": "You are a helpful assistant.",
7 "voice": "Eve",
8 "turn_detection": {
9 "type": "server_vad"
10 }
11 }
12}

Responding to the client that a new user message has been added to conversation history, or if an assistance response has been added to conversation history. https://docs.x.ai/docs/guides/voice/agent#server

Event constant: VoiceAgentAPIEvents.ConversationItemAdded

Payload

FieldTypeReq.Description
dataObjectA new user or assistant message has been added to the conversation history. See the partner event documentation.
data.event_idstringUnique event identifier.
data.typestringAlways conversation.item.added.
data.previous_item_idstringID of the preceding item in conversation history.
data.itemobjectThe conversation item that was added.

Example data:

1{
2 "event_id": "event_1920",
3 "type": "conversation.item.added",
4 "previous_item_id": "msg_002",
5 "item": {
6 "id": "msg_003",
7 "object": "realtime.item",
8 "type": "message",
9 "status": "completed",
10 "role": "user",
11 "content": [
12 {
13 "type": "input_audio",
14 "transcript": "hello how are you"
15 }
16 ]
17 }
18}

Notify the client the audio transcription for input has been completed. https://docs.x.ai/docs/guides/voice/agent#server

Event constant: VoiceAgentAPIEvents.ConversationItemInputAudioTranscriptionCompleted

Payload

FieldTypeReq.Description
dataObjectAudio transcription for the user’s input has been completed. See the partner event documentation.
data.event_idstringUnique event identifier.
data.type”conversation.item.input_audio_transcription.completed”Event type.
data.item_idstringID of the conversation item whose audio was transcribed.
data.transcriptstringThe transcribed text.

Example data:

1{
2 "event_id": "event_2122",
3 "type": "conversation.item.input_audio_transcription.completed",
4 "item_id": "msg_003",
5 "transcript": "Hello, how are you?"
6}

Input audio buffer has been committed. https://docs.x.ai/docs/guides/voice/agent#server-1

Event constant: VoiceAgentAPIEvents.InputAudioBufferCommitted

Payload

FieldTypeReq.Description
dataObjectInput audio buffer has been committed as a user message. See the partner event documentation.
data.event_idstringUnique event identifier.
data.typestringAlways input_audio_buffer.committed.
data.previous_item_idstringID of the preceding conversation item.
data.item_idstringID of the newly created user message item.

Example data:

1{
2 "event_id": "event_1121",
3 "type": "input_audio_buffer.committed",
4 "previous_item_id": "msg_001",
5 "item_id": "msg_002"
6}

Input audio buffer has been cleared. https://docs.x.ai/docs/guides/voice/agent#server-1

Event constant: VoiceAgentAPIEvents.InputAudioBufferCleared

Payload

FieldTypeReq.Description
dataObjectConfirms the input audio buffer has been cleared. See the partner event documentation.
data.event_idstringUnique event identifier.
data.typestringAlways input_audio_buffer.cleared.

Example data:

1{
2 "event_id": "event_1122",
3 "type": "input_audio_buffer.cleared"
4}

Notify the client the server’s VAD has detected the start of a speech. https://docs.x.ai/docs/guides/voice/agent#server-1

Event constant: VoiceAgentAPIEvents.InputAudioBufferSpeechStarted

Payload

FieldTypeReq.Description
dataObjectNotifies that the server’s VAD detected the start of speech. Only available with server_vad turn detection. See the partner event documentation.
data.event_idstringUnique event identifier.
data.typestringAlways input_audio_buffer.speech_started.
data.item_idstringID of the associated message item.
data.audio_start_msintegerMillisecond offset in the audio buffer where speech was detected.

Example data:

1{
2 "event_id": "event_1516",
3 "type": "input_audio_buffer.speech_started",
4 "item_id": "msg_003"
5}

Notify the client the server’s VAD has detected the end of a speech. https://docs.x.ai/docs/guides/voice/agent#server-1

Event constant: VoiceAgentAPIEvents.InputAudioBufferSpeechStopped

Payload

FieldTypeReq.Description
dataObjectNotifies that the server’s VAD detected the end of speech. Only available with server_vad turn detection. See the partner event documentation.
data.event_idstringUnique event identifier.
data.typestringAlways input_audio_buffer.speech_stopped.
data.item_idstringID of the associated message item.
data.audio_end_msintegerMillisecond offset in the audio buffer where speech ended.

Example data:

1{
2 "event_id": "event_1516",
3 "type": "input_audio_buffer.speech_stopped",
4 "item_id": "msg_003"
5}

A new assistant response turn is in progress. Audio delta created from this assistant turn will have the same response id. https://docs.x.ai/docs/guides/voice/agent#server-2

Event constant: VoiceAgentAPIEvents.ResponseCreated

Payload

FieldTypeReq.Description
dataObjectA new assistant response turn is in progress. Audio deltas from this turn share the same response_id. See the partner event documentation.
data.event_idstringUnique event identifier.
data.typestringAlways response.created.
data.responseobjectThe response object.

Example data:

1{
2 "event_id": "event_2930",
3 "type": "response.created",
4 "response": {
5 "id": "resp_001",
6 "object": "realtime.response",
7 "status": "in_progress",
8 "output": []
9 }
10}

The assistant’s response is completed. https://docs.x.ai/docs/guides/voice/agent#server-2

Event constant: VoiceAgentAPIEvents.ResponseDone

Payload

FieldTypeReq.Description
dataObjectThe assistant’s response is completed. Sent after all audio and transcript deltas. Ready for the client to add a new conversation item. See the partner event documentation.
data.event_idstringUnique event identifier.
data.typestringAlways response.done.
data.responseobjectThe completed response object.

Example data:

1{
2 "event_id": "event_3132",
3 "type": "response.done",
4 "response": {
5 "id": "resp_001",
6 "object": "realtime.response",
7 "status": "completed"
8 }
9}

A new assistant response is added to message history. https://docs.x.ai/docs/guides/voice/agent#server-2

Event constant: VoiceAgentAPIEvents.ResponseOutputItemAdded

Payload

FieldTypeReq.Description
dataObjectA new assistant response item is added to the message history. See the partner event documentation.
data.event_idstringUnique event identifier.
data.typestringAlways response.output_item.added.
data.response_idstringID of the response this item belongs to.
data.output_indexintegerIndex of the output item in the response.
data.itemobjectThe output item that was added.

Example data:

1{
2 "event_id": "event_3334",
3 "type": "response.output_item.added",
4 "response_id": "resp_001",
5 "output_index": 0,
6 "item": {
7 "id": "msg_007",
8 "object": "realtime.item",
9 "type": "message",
10 "status": "in_progress",
11 "role": "assistant",
12 "content": []
13 }
14}

A new assistant response is done.

Event constant: VoiceAgentAPIEvents.ResponseOutputItemDone

Payload

FieldTypeReq.Description
dataObjectAn output item is complete. See the partner event documentation.
data.event_idstringUnique event identifier.
data.typestringAlways response.output_item.done.
data.response_idstringID of the response this item belongs to.
data.output_indexintegerIndex of the output item in the response.
data.itemobjectThe completed output item.

Example data:

1{
2 "event_id": "event_3335",
3 "type": "response.output_item.done",
4 "response_id": "resp_001",
5 "output_index": 0,
6 "item": {
7 "id": "msg_007",
8 "object": "realtime.item",
9 "type": "message",
10 "status": "completed",
11 "role": "assistant",
12 "content": []
13 }
14}

Audio transcript delta of the assistant response. https://docs.x.ai/docs/guides/voice/agent#server-3

Event constant: VoiceAgentAPIEvents.ResponseOutputAudioTranscriptDelta

Payload

FieldTypeReq.Description
dataObjectStreaming text transcript delta of the assistant’s audio response. See the partner event documentation.
data.event_idstringUnique event identifier.
data.typestringAlways response.output_audio_transcript.delta.
data.response_idstringID of the response.
data.item_idstringID of the output item.
data.output_indexintegerIndex of the output item in the response.
data.content_indexintegerIndex of the content part within the item.
data.deltastringText transcript fragment.

Example data:

1{
2 "event_id": "event_4950",
3 "type": "response.output_audio_transcript.delta",
4 "response_id": "resp_001",
5 "item_id": "msg_008",
6 "delta": "Hello! I'm doing"
7}

The audio transcript delta of the assistant response has finished generating. https://docs.x.ai/docs/guides/voice/agent#server-3

Event constant: VoiceAgentAPIEvents.ResponseOutputAudioTranscriptDone

Payload

FieldTypeReq.Description
dataObjectThe audio transcript for this assistant turn has finished generating. See the partner event documentation.
data.event_idstringUnique event identifier.
data.typestringAlways response.output_audio_transcript.done.
data.response_idstringID of the response.
data.item_idstringID of the output item.
data.output_indexintegerIndex of the output item in the response.
data.content_indexintegerIndex of the content part within the item.
data.transcriptstringThe complete transcript text.

Example data:

1{
2 "event_id": "event_5152",
3 "type": "response.output_audio_transcript.done",
4 "response_id": "resp_001",
5 "item_id": "msg_008"
6}

Notifies client that the audio for this turn has finished generating. https://docs.x.ai/docs/guides/voice/agent#server-3

Event constant: VoiceAgentAPIEvents.ResponseOutputAudioDone

Payload

FieldTypeReq.Description
dataObjectAudio generation for this assistant turn has finished. See the partner event documentation.
data.event_idstringUnique event identifier.
data.typestringAlways response.output_audio.done.
data.response_idstringID of the response.
data.item_idstringID of the output item.
data.output_indexintegerIndex of the output item in the response.
data.content_indexintegerIndex of the content part within the item.

Example data:

1{
2 "event_id": "event_5152",
3 "type": "response.output_audio.done",
4 "response_id": "resp_001",
5 "item_id": "msg_008"
6}

Notifies client that the content part added.

Event constant: VoiceAgentAPIEvents.ResponseContentPartAdded

Payload

FieldTypeReq.Description
dataObjectA content part starts within an output item. See the partner event documentation.
data.event_idstringUnique event identifier.
data.typestringAlways response.content_part.added.
data.response_idstringID of the response.
data.item_idstringID of the output item.
data.output_indexintegerIndex of the output item in the response.
data.content_indexintegerIndex of the content part within the item.
data.partobjectThe content part.

Example data:

1{
2 "event_id": "event_3336",
3 "type": "response.content_part.added",
4 "response_id": "resp_001",
5 "item_id": "msg_007",
6 "output_index": 0,
7 "content_index": 0,
8 "part": {
9 "type": "audio"
10 }
11}

Notifies client that the content part done.

Event constant: VoiceAgentAPIEvents.ResponseContentPartDone

Payload

FieldTypeReq.Description
dataObjectA content part finishes. See the partner event documentation.
data.event_idstringUnique event identifier.
data.typestringAlways response.content_part.done.
data.response_idstringID of the response.
data.item_idstringID of the output item.
data.output_indexintegerIndex of the output item in the response.
data.content_indexintegerIndex of the content part within the item.
data.partobjectThe completed content part.

Example data:

1{
2 "event_id": "event_3337",
3 "type": "response.content_part.done",
4 "response_id": "resp_001",
5 "item_id": "msg_007",
6 "output_index": 0,
7 "content_index": 0,
8 "part": {
9 "type": "audio"
10 }
11}

Function call triggered with complete arguments. https://docs.x.ai/docs/guides/voice/agent#handling-function-call-responses

Event constant: VoiceAgentAPIEvents.ResponseFunctionCallArgumentsDone

Payload

FieldTypeReq.Description
dataObjectA function call has been triggered with complete arguments. Your code should execute the function and return results via conversation.item.create with type function_call_output. See the partner event documentation.
data.event_idstringUnique event identifier.
data.typestringAlways response.function_call_arguments.done.
data.response_idstringID of the response.
data.item_idstringID of the function call item.
data.output_indexintegerIndex of the output item in the response.
data.call_idstringUnique ID for this function call. Pass this as call_id in the conversation.item.create event with type function_call_output.
data.namestringName of the function to call.
data.argumentsstringJSON string of the function arguments.

Example data:

1{
2 "event_id": "event_fc01",
3 "type": "response.function_call_arguments.done",
4 "response_id": "resp_001",
5 "item_id": "msg_009",
6 "output_index": 0,
7 "call_id": "call_001",
8 "name": "get_weather",
9 "arguments": "{\"location\": \"San Francisco\"}"
10}

The WebSocket error response event.

Event constant: VoiceAgentAPIEvents.WebSocketError

Payload

No event-specific payload columns are listed here; this callback still receives the common client and data fields. For data, see the partner documentation for the exact JSON shape.

Contains information about connector.

Event constant: VoiceAgentAPIEvents.ConnectorInformation

Payload

No event-specific payload columns are listed here; this callback still receives the common client and data fields. For data, see the partner documentation for the exact JSON shape.