Example: Function calling | Voximplant Voice AI

For the complete documentation index, see llms.txt.

This example answers an inbound Voximplant call, connects it to Inworld Realtime, and handles function calls inside VoxEngine.

Jump to the Full VoxEngine scenario.

Prerequisites

Set up an inbound entrypoint for the caller:
Create a routing rule that points the destination to this scenario.
Store your Inworld API key in Voximplant Secrets under INWORLD_API_KEY.

Tool setup

Function tools are included in the Inworld session.update payload. The example exposes two client-side tools:

get_weather: returns demo weather data to the model.
hang_up: returns a tool result, lets Inworld speak a brief goodbye, and hangs up after VoxEngine finishes playing the goodbye audio.

Tool definition

1 tools: [
2   {
3     type: "function",
4     name: "get_weather",
5     description: "Get current weather for a location.",
6     parameters: {
7       type: "object",
8       properties: {
9         location: { type: "string" },
10       },
11       required: ["location"],
12     },
13   },
14   {
15     type: "function",
16     name: "hang_up",
17     description: "Hang up the current call.",
18     parameters: {
19       type: "object",
20       properties: {},
21       required: [],
22     },
23   },
24 ],
25 tool_choice: "auto",

The rest of the session config stays focused on the tool flow: model, prompt, audio input/output models, and concise TTS delivery settings. The system prompt tells the model not to speak before a tool result is available, so the caller does not hear a filler phrase such as “let me check” before the function result is ready. Unlike the richer inbound demo, this example does not force segmenter_strategy: "full_turn"; that keeps the tool-result answer more responsive once Inworld starts generating the final spoken response. For a broader walkthrough of Inworld voice behavior tuning, see Answering an incoming call.

Handle tool calls

Inworld emits the function name and call_id on the function-call output item, then emits the final JSON arguments with ResponseFunctionCallArgumentsDone. The example stores the function-call metadata by item id, reads the matching arguments, executes the local function, sends a function_call_output conversation item, and creates the follow-up response. For hang_up, the tool result tells the model to speak the goodbye only after the tool call is complete:

Track the function call

1 functionCallsByItemId[item.id] = {
2   name: item.name,
3   call_id: item.call_id,
4 };

Read the function arguments

1 const { item_id: itemId, arguments: rawArgs } = payload;
2 const functionCall = functionCallsByItemId[itemId];

Respond to a tool call

1 voiceAIClient.conversationItemCreate({
2   item: {
3     type: "function_call_output",
4     call_id: toolCallId,
5     output: JSON.stringify(result),
6   },
7 });
8 voiceAIClient.responseCreate({
9   response: {
10     output_modalities: ["audio", "text"],
11   },
12 });

Hang up after playback

1 voiceAIClient.addEventListener(Inworld.Events.WebSocketMediaEnded, () => {
2   if (!pendingHangup) return;
3   pendingHangup = false;
4   call.hangup();
5 });

Connect call audio

The call setup is otherwise the same as the inbound example. After SessionCreated, send the session config. After SessionUpdated, bridge media and seed an initial greeting with conversationItemCreate followed by responseCreate. The caller’s spoken turns go through the bridged audio stream; Inworld STT, semantic VAD, and create_response: true handle turn completion and response creation.

Barge-in

The function-calling example keeps the same interruption behavior as the other Inworld examples:

Barge-in

1 voiceAIClient.addEventListener(Inworld.RealtimeAPIEvents.InputAudioBufferSpeechStarted, () => {
2   voiceAIClient.outputAudioBufferClear({});
3 });

Notes

sessionKey can be any unique string to maintain context for the Inworld session.
Keep tool handlers fast in VoxEngine. For slow work, return an acknowledgement and move the long-running workflow outside the call path.
Tool responses must use the same call_id that Inworld provided.
Empty tool argument strings are treated as {}; tools without parameters do not need placeholder JSON.
Tool-call turns are silent until the function output is sent back and the follow-up response is created.
ResponseOutputAudioDone means Inworld finished generating the audio. For hang_up, this example waits for WebSocketMediaEnded so VoxEngine can finish playing the goodbye to the caller before disconnecting.
For the full provider event schema, see the Inworld module reference.

Full VoxEngine scenario

voxeengine-inworld-function-calling.js

1 /**
2  * Voximplant + Inworld Realtime API demo
3  * Scenario: answer an incoming call and handle Inworld function calls.
4  *
5  * Configure this in the Voximplant application:
6  * - Secret `INWORLD_API_KEY` (Voximplant Secrets)
7  */
8 
9 require(Modules.Inworld);
10 
11 const WEATHER_TOOL = "get_weather";
12 const HANGUP_TOOL = "hang_up";
13 
14 const SYSTEM_PROMPT = `
15 You are Voxi, a Voximplant developer advocate on a live phone call.
16 Voximplant is pronounced VOX-im-plant.
17 Keep answers short and natural.
18 
19 If the caller asks about weather, call get_weather.
20 If the caller wants to end the call, call hang_up without speaking first.
21 After the hang_up tool result is returned, say a brief goodbye.
22 When you call a tool, do not speak before the tool result is available.
23 
24 Voice style:
25 - Sound like an expressive product expert, not a flat IVR.
26 - Use short, human turns.
27 - Use at most one TTS-2 non-verbal tag per turn, and often none: [laugh], [breathe], [sigh], [clear throat].
28 - Use at most one [speak ...] steering tag per turn. If used, it must be first.
29 `;
30 
31 const SESSION_CONFIG = {
32     session: {
33         type: "realtime",
34         model: "claude-sonnet-4-6",
35         instructions: SYSTEM_PROMPT,
36         output_modalities: ["audio", "text"],
37         audio: {
38             input: {
39                 transcription: {
40                     model: "inworld/inworld-stt-1",
41                     prompt: "Important terms: Voximplant, VoxEngine, Inworld, weather, San Francisco.",
42                 },
43                 turn_detection: {
44                     type: "semantic_vad",
45                     eagerness: "high",
46                     create_response: true,
47                     interrupt_response: true,
48                 },
49             },
50             output: {
51                 voice: "Ashley",
52                 model: "inworld-tts-2",
53             },
54         },
55         providerData: {
56             tts: {
57                 delivery_mode: "BALANCED",
58             },
59         },
60         tools: [
61             {
62                 type: "function",
63                 name: WEATHER_TOOL,
64                 description: "Get current weather for a location.",
65                 parameters: {
66                     type: "object",
67                     properties: {
68                         location: {
69                             type: "string",
70                             description: "City name, for example San Francisco.",
71                         },
72                     },
73                     required: ["location"],
74                 },
75             },
76             {
77                 type: "function",
78                 name: HANGUP_TOOL,
79                 description: "Hang up the current call.",
80                 parameters: {
81                     type: "object",
82                     properties: {},
83                     required: [],
84                 },
85             },
86         ],
87         tool_choice: "auto",
88     },
89 };
90 
91 VoxEngine.addEventListener(AppEvents.CallAlerting, async ({call}) => {
92     let voiceAIClient;
93     let pendingHangup = false;
94     const functionCallsByItemId = {};
95 
96     // Helper to clean-up the call when done
97     const terminate = (event) => {
98         if (event) Logger.write(JSON.stringify(event));
99         voiceAIClient?.close();
100         VoxEngine.terminate();
101     };
102 
103     // Termination handlers.
104     call.addEventListener(CallEvents.Disconnected, terminate);
105     call.addEventListener(CallEvents.Failed, terminate);
106 
107     try {
108         call.answer();
109 
110         voiceAIClient = await Inworld.createRealtimeAPIClient({
111             apiKey:  VoxEngine.getSecretValue("INWORLD_API_KEY"),
112             sessionKey: `inworld-tools-${Date.now()}`,
113             onWebSocketClose: terminate,
114         });
115 
116         voiceAIClient.addEventListener(Inworld.RealtimeAPIEvents.SessionCreated, () => {
117             Logger.write("===Inworld.SessionCreated===");
118             voiceAIClient.sessionUpdate(SESSION_CONFIG);
119         });
120 
121         // Once the session is configured, bridge call media and trigger the greeting.
122         voiceAIClient.addEventListener(Inworld.RealtimeAPIEvents.SessionUpdated, () => {
123             Logger.write("===Inworld.SessionUpdated===");
124             // Bridge media between the call and Inworld Realtime.
125             VoxEngine.sendMediaBetween(call, voiceAIClient);
126             voiceAIClient.conversationItemCreate({
127                 item: {
128                     type: "message",
129                     role: "user",
130                     content: [
131                         {
132                             type: "input_text",
133                             text: "The phone call just connected. Say only: Hi, this is Voxi. I can check the weather.",
134                         },
135                     ],
136                 },
137             });
138             voiceAIClient.responseCreate({
139                 response: {
140                     output_modalities: ["audio", "text"],
141                 },
142             });
143         });
144 
145         voiceAIClient.addEventListener(Inworld.RealtimeAPIEvents.InputAudioBufferSpeechStarted, () => {
146             Logger.write("===BARGE-IN: Inworld.InputAudioBufferSpeechStarted===");
147             voiceAIClient.outputAudioBufferClear({});
148         });
149 
150         voiceAIClient.addEventListener(Inworld.RealtimeAPIEvents.ResponseOutputItemAdded, (event) => {
151             const payload = event?.data?.payload || event?.data || {};
152             const item = payload.item;
153             if (item?.type !== "function_call") return;
154 
155             functionCallsByItemId[item.id] = {
156                 name: item.name,
157                 call_id: item.call_id,
158             };
159         });
160 
161         voiceAIClient.addEventListener(
162             Inworld.RealtimeAPIEvents.ResponseFunctionCallArgumentsDone,
163             (event) => {
164                 const payload = event?.data?.payload || event?.data || {};
165                 const {item_id: itemId, arguments: rawArgs} = payload;
166                 const functionCall = functionCallsByItemId[itemId];
167                 const toolName = functionCall?.name;
168                 const toolCallId = functionCall?.call_id;
169 
170                 if (!toolName || !toolCallId) {
171                     Logger.write("===TOOL_CALL_MISSING_FIELDS===");
172                     Logger.write(JSON.stringify({payload, functionCall}));
173                     return;
174                 }
175 
176                 let args = {};
177                 if (typeof rawArgs === "string") {
178                     if (rawArgs.trim()) {
179                         try {
180                             args = JSON.parse(rawArgs);
181                         } catch (error) {
182                             Logger.write("===TOOL_ARGS_PARSE_ERROR===");
183                             Logger.write(rawArgs);
184                             Logger.write(error);
185                         }
186                     }
187                 } else if (rawArgs && typeof rawArgs === "object") {
188                     args = rawArgs;
189                 }
190 
191                 Logger.write("===TOOL_CALL_RECEIVED===");
192                 Logger.write(JSON.stringify({toolName, args}));
193 
194                 if (toolName === WEATHER_TOOL) {
195                     const location = args.location || "San Francisco";
196                     const result = {
197                         location,
198                         temperature_f: 72,
199                         condition: "sunny",
200                     };
201                     voiceAIClient.conversationItemCreate({
202                         item: {
203                             type: "function_call_output",
204                             call_id: toolCallId,
205                             output: JSON.stringify(result),
206                         },
207                     });
208                     Logger.write("===TOOL_RESPONSE_SENT===");
209                     Logger.write(JSON.stringify(result));
210                     voiceAIClient.responseCreate({
211                         response: {
212                             output_modalities: ["audio", "text"],
213                         },
214                     });
215                     return;
216                 }
217 
218                 if (toolName === HANGUP_TOOL) {
219                     pendingHangup = true;
220                     const result = {
221                         status: "ready_to_end_call",
222                         instruction: "Say a brief goodbye now.",
223                     };
224                     voiceAIClient.conversationItemCreate({
225                         item: {
226                             type: "function_call_output",
227                             call_id: toolCallId,
228                             output: JSON.stringify(result),
229                         },
230                     });
231                     Logger.write("===TOOL_RESPONSE_SENT===");
232                     Logger.write(JSON.stringify(result));
233                     voiceAIClient.responseCreate({
234                         response: {
235                             output_modalities: ["audio", "text"],
236                         },
237                     });
238                     return;
239                 }
240 
241                 const result = {error: `Unhandled tool: ${toolName}`};
242                 voiceAIClient.conversationItemCreate({
243                     item: {
244                         type: "function_call_output",
245                         call_id: toolCallId,
246                         output: JSON.stringify(result),
247                     },
248                 });
249                 Logger.write("===TOOL_RESPONSE_SENT===");
250                 Logger.write(JSON.stringify(result));
251                 voiceAIClient.responseCreate({
252                     response: {
253                         output_modalities: ["audio", "text"],
254                     },
255                 });
256             },
257         );
258 
259         // Let VoxEngine finish playing the goodbye audio before hanging up.
260         voiceAIClient.addEventListener(Inworld.Events.WebSocketMediaEnded, () => {
261             if (!pendingHangup) return;
262             Logger.write("===HANGUP_AFTER_MEDIA_ENDED===");
263             pendingHangup = false;
264             call.hangup();
265         });
266 
267         // Consolidated log-only handlers for lifecycle, audio, and error debugging.
268         [
269             Inworld.RealtimeAPIEvents.ConversationItemInputAudioTranscriptionDelta,
270             Inworld.RealtimeAPIEvents.ConversationItemInputAudioTranscriptionCompleted,
271             Inworld.RealtimeAPIEvents.ResponseCreated,
272             Inworld.RealtimeAPIEvents.ResponseDone,
273             Inworld.RealtimeAPIEvents.ResponseFunctionCallArgumentsDelta,
274             Inworld.RealtimeAPIEvents.ResponseOutputAudioDone,
275             Inworld.RealtimeAPIEvents.ResponseOutputAudioTranscriptDone,
276             Inworld.RealtimeAPIEvents.InputAudioBufferSpeechStopped,
277             Inworld.RealtimeAPIEvents.InputAudioBufferCommitted,
278             Inworld.RealtimeAPIEvents.InputAudioBufferCleared,
279             Inworld.RealtimeAPIEvents.OutputAudioBufferStarted,
280             Inworld.RealtimeAPIEvents.OutputAudioBufferStopped,
281             Inworld.RealtimeAPIEvents.OutputAudioBufferCleared,
282             Inworld.RealtimeAPIEvents.ConnectorInformation,
283             Inworld.RealtimeAPIEvents.HTTPResponse,
284             Inworld.RealtimeAPIEvents.Error,
285             Inworld.RealtimeAPIEvents.WebSocketError,
286             Inworld.RealtimeAPIEvents.Unknown,
287             Inworld.Events.WebSocketMediaStarted,
288             Inworld.Events.WebSocketMediaEnded,
289         ].forEach((eventName) => {
290             voiceAIClient.addEventListener(eventName, (event) => {
291                 Logger.write(`===${event.name}===`);
292                 if (event?.data) Logger.write(JSON.stringify(event.data));
293             });
294         });
295 
296     } catch (error) {
297         Logger.write("===UNHANDLED_ERROR===");
298         terminate(error instanceof Error ? {message: error.message, stack: error.stack} : {error: String(error)});
299     }
300 });