MCP Client

Connect VoxEngine voice agents to Model Context Protocol servers.
View as Markdown

For the complete documentation index, see llms.txt.

Overview

The VoxEngine MCP Client lets a Voice AI scenario connect to external Model Context Protocol (MCP) servers and call tools from the same VoxEngine code that manages the call.

MCP is an open protocol for exposing tools and context to AI applications. In a Voximplant scenario, the voice AI client can decide when a tool is needed, while VoxEngine keeps control over the MCP connection, validates the request, calls the tool, and returns the result to the model.

The result is a provider-neutral tool layer. You can use the same MCP server with any of Voximplant’s Voice AI Clients.

Architecture

MCP Client architecture demo

As shown in the image above, the typical MCP flow is as follows:

  1. Some user speech or user-related input triggers a tool/function call in the LLM.
  2. The VoxEngine scenario directs this tool call to the MCP client.
  3. The MCP Client sends a request to the MCP Server.
  4. The MCP server responds with some data or a confirmation.
  5. The VoxEngine scenario directs this response to the LLM as a tool response.
  6. The LLM incorporates this data into its new response.

In this pattern, the voice AI client handles the conversation and decides when a tool is needed. The MCP Client handles the MCP Server interaction from VoxEngine. Your scenario code remains the control point: it can choose which tool to call, shape arguments, add call context, and decide how to recover if the tool fails.

Benefits

  • Keep tool orchestration in VoxEngine.
  • Use the same MCP server across voice AI providers.
  • Add telephony context before tool execution.
  • Control validation, allowlists, retries, and error handling.
  • Keep logging and audit behavior close to the call logic.
  • Avoid building a separate MCP gateway service.

Use Cases

Does your LLM support MCP?

Many LLMs have MCP client support built in. If your voice AI provider supports attaching MCP servers directly, follow that provider’s session configuration requirements and pass the required parameter or configuration object to the Voximplant Voice AI client.

As discussed below, even if your LLM or Voice AI system includes an MCP client, you may still want to leverage a different MCP client, such as the one provided by Voximplant, for increased control and debugging help.

Typical Voximplant MCP Client uses

If your LLM does not support MCP capabilities natively, the Voximplant MCP Client can provide full MCP support. Even if your LLM supports MCP, the MCP Client offers several valuable use cases:

  • Custom Authentication - if your MCP server needs specific authentication, such as per-user authentication, you can use the MCP Client and Voximplant’s other API interaction options to allow authentication based on phone number, DTMF entry, or a spoken code.
  • Tool and Argument Management - many external servers expose extensive capabilities that exceed the requirements of a voice interface. By utilizing the MCP Client, you can intercept and refine communication to and from the MCP server, reducing potential errors and providing more deterministic behavior.
  • Data Compliance and Redaction - ensure strict adherence to privacy standards by intercepting data exchange. The MCP Client provides the necessary oversight to filter, modify, or mask sensitive details transmitted between the LLM and your external MCP services.
  • Observability - built-in MCP implementations in LLMs frequently lack granular reporting. By using Voximplant’s MCP Client, you gain comprehensive visibility and can manage every log entry within your scenario logic.
  • Debugging - ensuring the LLM interacts accurately with the MCP server often requires iterative prompt engineering. For initial troubleshooting, utilize a static mapping from the model to the MCP server to confirm execution and latency before optimizing the system prompt.

Demo video

MCP Client demo:

MCP Client Setup and Interactions

Create the MCP Client near the start of the application flow before the model needs to call external tools.

Client Creation

MCP.createClient(...) accepts an mcpServerConnectionConfig object with these common fields:

  • transport: the transport supported by your MCP server - http and sse are supported.
  • endpoint: the MCP server URL.
  • headers: optional headers required by the MCP server.
  • clientName and clientVersion: values that identify your VoxEngine MCP client during the MCP handshake.

Example:

1const mcpClient = await MCP.createClient({
2 mcpServerConnectionConfig: {
3 transport: "http",
4 endpoint: VoxEngine.getSecretValue("MCP_SERVER_URL"),
5 headers: {
6 Accept: "application/json, text/event-stream",
7 },
8 clientName: "voximplant-mcp-demo",
9 clientVersion: "1.0.0",
10 },
11});

Use http as the transport when connecting to MCP servers that support the newer Streamable HTTP transport. Use sse only for compatibility with older MCP servers that implement the deprecated HTTP+SSE transport.

Then, you can use the ConnectorInformation event to verify the connection is established:

1mcpClient.addEventListener(MCP.ServerEvents.ConnectorInformation, (event) => {
2 Logger.write(`===MCP_CONNECTOR_INFORMATION===> ${JSON.stringify(event.data.payload)}`);
3});

That will return the Voximplant MCP Client application version and a unique connection ID.

1{
2 "applicationVersion": "0.51.0",
3 "id": "bf21a7a8a3ead25a2b9bc95db704108b",
4 "endpoint": "/mcp/client"
5}

Tool discovery

After the client connects, it is common to request the available tools using mcpClient.listTools({}).

You should provide the listTools method a blank object ({}) as a parameter.

Then listen for the MCP.ServerEvents.ToolsList event:

1mcpClient.addEventListener(MCP.ServerEvents.ToolsList, (event) => {
2 const tools = event?.data?.payload?.tools || [];
3 Logger.write(`===MCP_TOOLS_LIST===> ${mcpTools.length} Tools Available:`);
4 Logger.write(JSON.stringify(mcpTools.map((tool) => `${tool.name}: ${tool.description}`)));
5 Logger.write(JSON.stringify(mcpTools)); // full tool object
6});

The Voximplant MCP Client will provide a tools object that contains the total tool count and each tool name.

You can filter this list and its contents before passing it to the LLM. The ToolsList is also useful for logging, diagnostics, and startup validation. The call flow does not need to expose this information to the caller.

Tool calls

All of Voximplant’s Voice AI clients expose tools or function-call events. The LLM needs to be configured with these available tools as part of the session configuration that you pass to it. Typically, the LLM initiates a tool call in response to some input - for example, if a user says “I want to schedule a visit”.

When using MCP-based tools, the simplest approach is to use the MCP tool definitions and parameters from the MCP.ServerEvents.ToolsList response in your tool declaration. The MCP tool object must be adapted to LLMs tool scheme.

For example, with OpenAI Realtime:

1const mcpTools = tools.map((tool) => ({
2 type: "function",
3 name: tool.name,
4 description: tool.name,
5 parameters: tool.inputSchema,
6 }));

Then you can pass this tool object to the LLM.

1// Add the MCP tools to the session configuration
2const SESSION_CONFIG = {
3 session: {
4 type: "realtime",
5 instructions: SYSTEM_PROMPT,
6 voice: "marin",
7 output_modalities: ["audio"],
8 tools: [
9 ...mcpTools,
10 // Manually defined tool example
11 {
12 type: "function",
13 name: "hangup_call",
14 description: "Hang up the current call",
15 parameters: {
16 type: "object",
17 properties: {},
18 required: [],
19 },
20 },
21 ],
22 tool_choice: "auto",
23 },
24};
25
26// Update the session with the new tools
27voiceAIClient.sessionUpdate(SESSION_CONFIG);

The MCP tool to session-defined tool adaptation scheme varies by LLM. See the Adapting MCP tools section below for strategies to adjust the tool definitions to better fit your LLM’s requirements.

Later, when the LLM initiates a ToolCall event, we can compare the returned tool call names against the MCP tool names and use mcpClient.callTool({name, arguments: args}) to invoke the server.

1openaiClient.addEventListener(OpenAI.RealtimeAPIEvents.ResponseOutputItemAdded, (event) => {
2 const item = event?.data?.payload?.item || {};
3 // OpenAI Realtime provides the function name and arguments in separate events,
4 // so we store the function name in a temporary object until the arguments arrive.
5 if (item.type === "function_call" && item.call_id && item.name)
6 functionNameByCallId[item.call_id] = item.name;
7});
8
9openaiClient.addEventListener(OpenAI.RealtimeAPIEvents.ResponseFunctionCallArgumentsDone, (event) => {
10 const {call_id: callId, arguments: rawArgs} = event.data.payload;
11 const name = functionNameByCallId[callId];
12 const args = JSON.parse(rawArgs || "{}");
13
14 delete functionNameByCallId[callId];
15
16 // Send MCP tool calls to the MCP Server.
17 if (requiredMcpTools.includes(name)) {
18 pendingOpenAIFunction = {callId, name};
19 Logger.write("===ZAPIER_MCP_TOOL_CALL===");
20 Logger.write(JSON.stringify({callId, tool: name, args}));
21
22 mcpClient.callTool({name, arguments: args});
23 }
24 // You can still define your own local tools.
25 else if (name === "hangup_call") {
26 pendingHangup = true;
27
28 sendOpenAIToolOutput(callId, {
29 result: "Goodbye. Thank you for calling Voxy Plumbers.",
30 });
31 }
32 else {
33 sendOpenAIToolOutput(callId, {
34 error: `Unhandled tool: ${name}`,
35 });
36 }
37});

We must respond to the ToolCall event with a result. In this case, we need to wait for the MCP.ServerEvents.ToolResult event before sending the tool response. We do that with OpenAI Realtime as follows:

1mcpClient.addEventListener(MCP.ServerEvents.ToolResult, (event) => {
2 // Match this MCP result to the OpenAI function call that triggered it.
3 if (!pendingOpenAIFunction) return;
4
5 let mcpOutput = JSON.parse(event.data.payload.content?.[0]?.text); // ToDo: catch parsing errors
6
7 if (!mcpOutput || event.data.payload.isError || mcpOutput.error) {
8 mcpOutput = {error: mcpOutput.error || "MCP tool failed."};
9 }
10
11 // Add the response to the conversation
12 openaiClient.conversationItemCreate({
13 item: {
14 type: "function_call_output",
15 call_id: pendingOpenAIFunction.callId,
16 output: JSON.stringify(mcpOutput),
17 },
18 });
19 // Tell OpenAI to continue the turn
20 openaiClient.responseCreate({});
21});

The above flow assumes an agentic approach where the LLM interprets MCP Server’s tools and invokes them with the appropriate parameters. It is also possible to hardcode tools and parameters or make adjustments to them. We will demonstrate that in the Gemini-Zapier-MCP Example.