> For a complete documentation index, fetch https://docs.voximplant.ai/llms.txt

# Media over WebSockets

WebSocket is a full-duplex protocol for real-time communication between a client and a third-party service.
VoxEngine can create outgoing WebSocket connections, accept incoming WebSocket connections,
and exchange text or audio media over either connection type.

Audio over WebSockets is controlled by JSON messages with an `event` field.
A media stream starts with `start`, carries audio chunks in `media`, and ends with `stop`.
See [Media stream format](#media-stream-format) for details on the message schema.

## WebSocket Connectivity Overview

Voximplant supports both outgoing and incoming WebSocket connections and can send both text and media data.
The interface includes specialized handling for audio media streams for interoperability with calls, conferences, recorders, and other VoxEngine media units.

### Outgoing WebSocket connections

Create a [WebSocket](/api-reference/voxengine/websocket) object with [`VoxEngine.createWebSocket(...)`](/api-reference/voxengine/vox-engine#createwebsocket).
Voximplant Cloud performs the WebSocket handshake with the third-party service, then the scenario receives [`WebSocketEvents.OPEN`](/api-reference/voxengine/websocket-events).

After the connection is open:

* Send strings and stringified data with [`webSocket.send(...)`](/api-reference/voxengine/websocket#send).
* Send call audio to the WebSocket with [`call.sendMediaTo(webSocket, parameters)`](/api-reference/voxengine/call#sendmediato).
  Use [`SendMediaParameters`](/api-reference/voxengine/call#sendmediaparameters) to set `encoding`, `tag`, and `customParameters`.
  If `encoding` is omitted, [`WebSocketAudioEncoding`](/api-reference/voxengine/websocket-audio-encoding) defaults to `PCM16_8KHZ`.
* Stop media with `stopMediaTo(...)` or close the connection with [`webSocket.close()`](/api-reference/voxengine/websocket#close).

```mermaid
sequenceDiagram
  participant Initiator as Session initiator
  participant Scenario as VoxEngine scenario
  participant Cloud as Voximplant Cloud
  participant Service as Third-party service

  Initiator->>Scenario: Start scenario
  Scenario->>Cloud: VoxEngine.createWebSocket(url)
  Cloud->>Service: WebSocket handshake
  Service-->>Cloud: Handshake accepted
  Cloud-->>Scenario: WebSocketEvents.OPEN

  rect rgba(125, 90, 255, 0.14)
    Note over Scenario,Service: Bidirectional exchange
    Scenario->>Cloud: call.sendMediaTo(webSocket)
    Cloud->>Service: start / media / stop events
    Scenario->>Cloud: webSocket.send(data)
    Cloud->>Service: WebSocket data
    Service-->>Cloud: WebSocket data or media
    Cloud-->>Scenario: WebSocketEvents.MESSAGE
  end

  Scenario->>Cloud: webSocket.close()
  Cloud->>Service: Close WebSocket
  Cloud-->>Scenario: WebSocketEvents.CLOSE
```

The minimal setup for an outgoing connection looks like this:

```js
VoxEngine.addEventListener(AppEvents.CallAlerting, (event) => {
  const webSocket = VoxEngine.createWebSocket("wss://your-url.example.com");
  // Handle WebSocketEvents.OPEN, MESSAGE, ERROR, and CLOSE.
});
```

Existing WebSocket connections are not automatically destroyed when a call ends.
Make sure to close outgoing and incoming WebSocket connections with [`webSocket.close()`](/api-reference/voxengine/websocket#close) when the scenario no longer needs them.
A WebSocket can also be closed by Voximplant Cloud or by the third-party service, so handle [`WebSocketEvents.CLOSE`](/api-reference/voxengine/websocket-events#close) in every flow.

### Incoming WebSocket connections

Enable incoming WebSocket access with [`VoxEngine.allowWebSocketConnections()`](/api-reference/voxengine/vox-engine#allowwebsocketconnections), then subscribe to [`AppEvents.WebSocket`](/api-reference/voxengine/app-events#websocket). The third-party service connects to the session WebSocket URL, and VoxEngine gives the scenario a `websocket` object for that connection.

Get the session URL from the `StartScenarios` Management API response or from [`AppEvents.Started`](/api-reference/voxengine/app-events#started). Replace `https` with `wss` before giving the URL to the external service.

```mermaid
sequenceDiagram
  participant Service as Third-party service
  participant Cloud as Voximplant Cloud
  participant Scenario as VoxEngine scenario

  Service->>Cloud: WebSocket handshake to session URL
  Cloud-->>Service: Handshake accepted
  Cloud->>Scenario: AppEvents.WebSocket

  rect rgba(125, 90, 255, 0.14)
    Note over Service,Scenario: Bidirectional exchange
    Scenario->>Cloud: call.sendMediaTo(webSocket)
    Cloud->>Service: start / media / stop events
    Scenario->>Cloud: webSocket.send(data)
    Cloud->>Service: WebSocket data
    Service-->>Cloud: WebSocket data or media
    Cloud-->>Scenario: WebSocketEvents.MESSAGE
  end

  Scenario->>Cloud: webSocket.close()
  Cloud->>Service: Close WebSocket
  Cloud-->>Scenario: WebSocketEvents.CLOSE
```

The minimal setup for accepting an incoming WebSocket connection looks like this:

```js
VoxEngine.allowWebSocketConnections();

VoxEngine.addEventListener(AppEvents.WebSocket, (event) => {
  const webSocket = event.websocket;
  // Handle the incoming WebSocket connection here.
});
```

The maximum number of incoming WebSocket connections cannot be greater than the number of calls in the session plus 3.
If a session receives one more connection, VoxEngine triggers [`AppEvents.NewWebSocketFailed`](/api-reference/voxengine/app-events#newwebsocketfailed).

### Sending text data

Use [`WebSocket.send(...)`](/api-reference/voxengine/websocket#send) to enqueue string data for transmission over the WebSocket connection.
For structured messages, serialize your payload before calling `send`.
Audio media itself is exchanged as JSON `start`, `media`, and `stop` events; see [Media stream format](#media-stream-format).

```js
VoxEngine.addEventListener(AppEvents.CallAlerting, ({ call }) => {
  call.answer();
  call.addEventListener(CallEvents.Disconnected, VoxEngine.terminate);

  const webSocket = VoxEngine.createWebSocket("wss://your-url.example.com");

  webSocket.addEventListener(WebSocketEvents.OPEN, () => {
    Logger.write("Sending message");
    webSocket.send("Some test message");
  });

  webSocket.addEventListener(WebSocketEvents.MESSAGE, (message) => {
    Logger.write(`Received message ${message.text}`);
  });

  webSocket.addEventListener(WebSocketEvents.CLOSE, () => {
    VoxEngine.terminate();
  });
});
```

```python
import asyncio
import websockets

async def echo(websocket):
    async for message in websocket:
        await websocket.send(message)

async def main():
    async with websockets.serve(echo, "localhost", 8765):
        await asyncio.Future()

asyncio.run(main())
```

<a id="media-stream-format" />

## Working with Audio Media

VoxEngine media streams use JSON messages to describe audio sent over WebSocket connections.
Use this format when a custom application receives call audio from Voximplant or sends audio back into a call, conference, recorder, or another media unit.

Media streams are started and stopped from a VoxEngine scenario with methods such as `sendMediaTo(...)` and `stopMediaTo(...)`.
Each stream starts with a `start` event, carries audio chunks in `media` events, and ends with a `stop` event.

The `event` field is reserved for the system media-stream events (`start`, `media`, `stop`) and is mandatory for them.
For your own application messages, use a `customEvent` field instead. Do not combine `event` and `customEvent` in the same message.

### Sending call audio to a WebSocket

Use [`call.sendMediaTo(webSocket, parameters)`](/api-reference/voxengine/call#sendmediato) to stream call audio to a WebSocket.
The optional [`SendMediaParameters`](/api-reference/voxengine/call#sendmediaparameters) object lets you set:

* `encoding`: a [`WebSocketAudioEncoding`](/api-reference/voxengine/websocket-audio-encoding) value. The default is `PCM16_8KHZ`.
* `tag`: a label for matching `start`, `media`, and `stop` events that belong to the same stream.
* `customParameters`: application-specific metadata included in the stream `start` event.

```js
VoxEngine.addEventListener(AppEvents.CallAlerting, ({ call }) => {
  call.answer();
  call.addEventListener(CallEvents.Disconnected, VoxEngine.terminate);

  const webSocket = VoxEngine.createWebSocket("wss://your-url.example.com");

  webSocket.addEventListener(WebSocketEvents.OPEN, () => {
    call.sendMediaTo(webSocket, {
      encoding: WebSocketAudioEncoding.ALAW,
      tag: "call",
      customParameters: { source: "inbound-call" },
    });
  });

  webSocket.addEventListener(WebSocketEvents.MESSAGE, (message) => {
    Logger.write(JSON.stringify(message));
  });

  webSocket.addEventListener(WebSocketEvents.CLOSE, VoxEngine.terminate);
});
```

#### Receiving payload and parameter details

When Voximplant sends audio to your WebSocket service, the stream begins with a `StartEvent`, continues with `MediaInfo` events, and finishes with a `StopEvent`.

If several streams are sent at the same time, use the `tag` field on `StartEvent`, `MediaInfo`, and `StopEvent` to group chunks by stream.

Use the same `tag` value across the `start`, `media`, and `stop` events for one stream. If a WebSocket carries multiple streams, group chunks by `tag`.

##### StartEvent

`StartEvent` is generated by `sendMediaTo(...)`. For example, this sends call audio to a WebSocket:

```js
call.sendMediaTo(webSocket);
```

The WebSocket receives:

```js
{
  event: "start",
  sequenceNumber: 0,
  start: {
    mediaFormat: {
      encoding: "PCM16",
      sampleRate: 8000,
    },
  },
}
```

You can also set the stream `tag`, `customParameters`, and audio `encoding` from the scenario:

```js
call.sendMediaTo(webSocket, {
  tag: "stream2",
  encoding: WebSocketAudioEncoding.PCM16_8KHZ,
  customParameters: { test: "123" },
});
```

This produces a `StartEvent` with the chosen tag, media format, and custom parameters:

```js
{
  event: "start",
  sequenceNumber: 0,
  start: {
    tag: "stream2",
    mediaFormat: {
      encoding: "PCM16",
      sampleRate: 8000,
    },
    customParameters: "{\"test\":\"123\"}",
  },
}
```

##### MediaInfo

`MediaInfo` events deliver audio chunks. The audio data is stored in `media.payload`.

This example contains a 20 ms audio chunk encoded as `PCM16` at `8000` Hz:

```js
{
  event: "media",
  sequenceNumber: 4,
  media: {
    timestamp: 18880,
    chunk: 3,
    payload: "AAAAAAAAAAAAAAAAAAAAAA==",
  },
}
```

`MediaInfo.payload` uses the codec specified in `StartEvent.mediaFormat`.

##### Audio duration

The audio duration inside `MediaInfo.payload` depends on the source: a call, player, WebSocket, or another media unit.

For call audio, the chunk length should correspond to the `a=ptime` and `a=maxptime` attributes in the call SDP. In most cases, `MediaInfo.payload` contains about 20 ms of audio.

After decoding the payload into `MediaCodec.Codec.PCM16` at the same sample rate, calculate the duration in milliseconds as:

```js
sizeInBytes(payloadInPcm16) / 2 * 1000 / StartEvent.mediaFormat.sampleRate
```

##### chunk and timestamp

`MediaInfo.chunk` and `MediaInfo.timestamp` correspond to the sequence number and timestamp fields of the RTP header.

Keep these details in mind:

* Unlike RFC 3550, `chunk` and `timestamp` use `uint64` width.
* `chunk` values can be skipped. This means some RTP packets sent over the WebSocket were lost in the network.
* Your application should handle lost chunks, for example by using a Packet Loss Concealment (PLC) mechanism.
* Voximplant uses an adaptive jitter buffer to handle duplicates and reordered packets, but it does not guarantee that every chunk is delivered in the correct order. See [RFC 3550](https://datatracker.ietf.org/doc/html/rfc3550) for RTP sequence number and timestamp details.

Calculate the number of lost chunks as:

```js
Current.MediaInfo.chunk - LastReceived.MediaInfo.chunk - 1
```

##### tag

Use `MediaInfo.tag` when more than one media stream is transmitted over the same WebSocket connection. Your application should demultiplex `MediaInfo.payload` into separate media streams by grouping chunks with the same tag.

##### StopEvent

`StopEvent` marks the end of `MediaInfo` events for a stream. After this event, the next stream starts with a new `StartEvent`.

`StopEvent.tag` identifies the stream, and `StopEvent.mediaInfo` contains statistics for the completed stream:

```js
{
  event: "stop",
  tag: "stream2",
  sequenceNumber: 1009,
  stop: {
    mediaInfo: {
      bytesSent: 645120,
      duration: 340800,
    },
  },
}
```

### Attaching WebSocket audio to a call

A WebSocket can also send audio back to Voximplant and route it into a call, conference, recorder, or another media unit.
This example accepts an incoming WebSocket connection and forwards audio from that WebSocket to the active call with [`websocket.sendMediaTo(inCall)`](/api-reference/voxengine/websocket#sendmediato):

```js
let inCall;
let webSocketUrl;

VoxEngine.addEventListener(AppEvents.Started, ({ accessSecureURL }) => {
  webSocketUrl = accessSecureURL.replace("https", "wss");
  VoxEngine.allowWebSocketConnections();
});

VoxEngine.addEventListener(AppEvents.CallAlerting, ({ call }) => {
  inCall = call;
  call.answer();

  call.sendMessage(`use websocket url ${webSocketUrl}`);

  call.addEventListener(CallEvents.Disconnected, VoxEngine.terminate);
  call.addEventListener(CallEvents.Failed, VoxEngine.terminate);
});

VoxEngine.addEventListener(AppEvents.WebSocket, ({ websocket }) => {
  websocket.addEventListener(WebSocketEvents.ERROR, () => {
    Logger.write("Incoming WebSocket error");
  });

  websocket.addEventListener(WebSocketEvents.CLOSE, ({ reason }) => {
    Logger.write(`Incoming WebSocket closed: ${reason}`);
  });

  websocket.addEventListener(WebSocketEvents.MESSAGE, ({ text }) => {
    Logger.write(`Incoming WebSocket message: ${text}`);
  });

  websocket.sendMediaTo(inCall);
});
```

After the external service receives the `wss` URL, it can connect and send audio stream events to the call:

```sh
node server-code.js wss://example-link
```

This example sends a raw 8 kHz mu-law audio file named `sample` into the WebSocket connection.

```js
const fs = require("node:fs");
const WebSocketClient = require("websocket").client;

if (process.argv.length !== 3) {
  process.exit(1);
}

const wsUrl = process.argv[2];
const client = new WebSocketClient();

let startTime = null;

client.on("connectFailed", (error) => {
  console.log(`Connect error: ${error.toString()}`);
  process.exit(1);
});

client.on("connect", (connection) => {
  console.log("Connection established");

  connection.on("error", (error) => {
    console.log(`Connection error: ${error.toString()}`);
    process.exit(1);
  });

  connection.on("close", () => {
    console.log("Connection closed");
    process.exit(0);
  });

  const startEvent = {
    event: "start",
    sequenceNumber: 0,
    start: {
      mediaFormat: {
        encoding: "ULAW",
        sampleRate: 8000,
      },
    },
  };

  connection.send(JSON.stringify(startEvent));

  const CHUNK_DURATION_MS = 20;
  const CHUNK_SIZE = 160;
  const buffer = Buffer.alloc(CHUNK_SIZE);
  let sequenceNumber = 1;
  let mediaChunkIndex = 0;

  fs.open("sample", "r", (openError, fd) => {
    if (openError) {
      throw openError;
    }

    function readNextChunk() {
      fs.read(fd, buffer, 0, CHUNK_SIZE, null, (readError, bytesRead) => {
        if (readError) {
          throw readError;
        }

        if (bytesRead < CHUNK_SIZE) {
          const stopEvent = {
            event: "stop",
            sequenceNumber,
            stop: {},
          };

          connection.send(JSON.stringify(stopEvent));
          process.exit(0);
          return;
        }

        const mediaEvent = {
          event: "media",
          sequenceNumber,
          media: {
            chunk: mediaChunkIndex,
            payload: buffer.toString("base64"),
            timestamp: mediaChunkIndex * CHUNK_SIZE,
          },
        };

        connection.send(JSON.stringify(mediaEvent));
        sequenceNumber += 1;
        mediaChunkIndex += 1;

        const now = Date.now();
        if (startTime === null) {
          startTime = now;
        }

        const nextChunkTime = startTime + CHUNK_DURATION_MS * mediaChunkIndex;
        setTimeout(readNextChunk, Math.max(nextChunkTime - now, 0));
      });
    }

    readNextChunk();
  });
});

console.log(`Connect to ${wsUrl}`);
client.connect(wsUrl);
```

Convert source audio to raw 8 kHz mu-law mono with FFmpeg:

```sh
ffmpeg -i ./record.mp3 -f mulaw -acodec pcm_mulaw -ac 1 -ar 8000 sample
```

#### Sending payload and parameter details

A WebSocket can send audio back to Voximplant and route it to a call, conference, recorder, or another media unit.

This sends audio from a WebSocket into a call:

```js
webSocket.sendMediaTo(call);
```

If you send several streams, assign a unique tag to each one:

```js
webSocket.sendMediaTo(call, {
  tag: "stream1",
});

webSocket.sendMediaTo(recorder, {
  tag: "stream2",
});
```

##### Send StartEvent

The first event in the stream must be a `StartEvent`. It specifies the media stream codec in `StartEvent.mediaFormat`.

If `StartEvent` is valid, VoxEngine triggers [`WebSocketEvents.MEDIA_STARTED`](/api-reference/voxengine/websocket-events#media_started) in the scenario. The `StartEvent.tag`, `StartEvent.customParameters`, and `StartEvent.mediaFormat` fields are reflected in the event as `tag`, `customParameters`, and `encoding`.

##### Send MediaInfo

Split the stream into media chunks with these recommendations:

* The duration of each `MediaInfo.payload` can be arbitrary, but a multiple of 20 ms is recommended.
* You do not need to send chunks in real time. For example, you can send all chunks at once; Voximplant stores them in the WebSocket buffer and forwards them to the target media unit in real time.
* The maximum WebSocket buffer size is 10 seconds. If the limit is reached, extra audio chunks are discarded.

The media chunk must be stored in `MediaInfo.payload`, and its encoding must match `StartEvent.mediaFormat`.

Sending faster than real time is useful for file playback and generated audio. Voximplant plays buffered WebSocket media into the target media unit in real time, and [`clearMediaBuffer(...)`](/api-reference/voxengine/websocket#clearmediabuffer) can interrupt queued playback.

##### Send chunk and timestamp

`MediaInfo.chunk` and `MediaInfo.timestamp` correspond to the RTP sequence number and timestamp fields.

If the stream does not contain lost, duplicated, or reordered chunks:

* Increment `MediaInfo.chunk` by 1 for each next packet.
* Calculate `MediaInfo.timestamp` as the sum of samples in the previous chunks.

The number of samples in a PCM16 audio chunk is:

```js
sizeInBytes(payloadInPcm16) / 2
```

If the stream includes lost, duplicated, or reordered chunks, `MediaInfo.chunk` and `MediaInfo.timestamp` should accurately reflect those losses, duplications, and reorderings.

For a normal stream, keep `chunk` and `timestamp` monotonic. Only send skipped, duplicated, or reordered values when your application needs to represent packet loss, duplication, or reordering.

##### Send StopEvent

When all media chunks are sent, send a `StopEvent`.

If the corresponding `StartEvent` included a `tag`, the `StopEvent.tag` value must match it exactly. A valid `StopEvent` triggers [`WebSocketEvents.MEDIA_ENDED`](/api-reference/voxengine/websocket-events#media_ended) in the scenario.

#### Changing the media codec

Do not change the codec specified in `StartEvent.mediaFormat` within the same stream.

To switch codecs, stop the current stream with a `StopEvent`, then start a new stream with another `StartEvent` and the new `mediaFormat`.

The codec declared in `StartEvent.mediaFormat` applies to every following `MediaInfo.payload` until the matching `StopEvent`.