Example: Function calling

View as Markdown

For the complete documentation index, see llms.txt.

This example answers an inbound Voximplant call, connects it to Inworld Realtime, and handles function calls inside VoxEngine.

Jump to the Full VoxEngine scenario.

Prerequisites

Tool setup

Function tools are included in the Inworld session.update payload. The example exposes two client-side tools:

  • get_weather: returns demo weather data to the model.
  • hang_up: returns a tool result, lets Inworld speak a brief goodbye when audio is produced, and hangs up after the response completes.
Tool definition
1tools: [
2 {
3 type: "function",
4 name: "get_weather",
5 description: "Get current weather for a location.",
6 parameters: {
7 type: "object",
8 properties: {
9 location: { type: "string" },
10 },
11 required: ["location"],
12 },
13 },
14 {
15 type: "function",
16 name: "hang_up",
17 description: "Hang up the current call.",
18 parameters: {
19 type: "object",
20 properties: {},
21 required: [],
22 },
23 },
24],
25tool_choice: "auto",

The rest of the session config stays focused on the tool flow: model, prompt, audio input/output models, and concise TTS delivery settings. The system prompt tells the model not to speak before a tool result is available, so the caller does not hear a filler phrase such as “let me check” before the function result is ready. Unlike the richer inbound demo, this example does not force segmenter_strategy: "full_turn"; that keeps the tool-result answer more responsive once Inworld starts generating the final spoken response. For a broader walkthrough of Inworld voice behavior tuning, see Answering an incoming call.

Handle tool calls

Inworld emits the function name and call_id on the function-call output item, then emits the final JSON arguments with ResponseFunctionCallArgumentsDone. The example stores the function-call metadata by item id, reads the matching arguments, executes the local function, sends a function_call_output conversation item, and creates the follow-up response:

Track the function call
1functionCallsByItemId[item.id] = {
2 name: item.name,
3 call_id: item.call_id,
4};
Read the function arguments
1const { item_id: itemId, arguments: rawArgs } = payload;
2const functionCall = functionCallsByItemId[itemId];
Respond to a tool call
1voiceAIClient.conversationItemCreate({
2 item: {
3 type: "function_call_output",
4 call_id: toolCallId,
5 output: JSON.stringify(result),
6 },
7});
8voiceAIClient.responseCreate({
9 response: {
10 output_modalities: ["audio", "text"],
11 },
12});

Connect call audio

The call setup is otherwise the same as the inbound example. After SessionCreated, send the session config. After SessionUpdated, bridge media and seed an initial greeting with conversationItemCreate followed by responseCreate. The caller’s spoken turns go through the bridged audio stream; Inworld STT, semantic VAD, and create_response: true handle turn completion and response creation.

Barge-in

The function-calling example keeps the same interruption behavior as the other Inworld examples:

Barge-in
1voiceAIClient.addEventListener(Inworld.RealtimeAPIEvents.InputAudioBufferSpeechStarted, () => {
2 voiceAIClient.outputAudioBufferClear({});
3});

Notes

  • sessionKey can be any unique string to maintain context for the Inworld session.
  • Keep tool handlers fast in VoxEngine. For slow work, return an acknowledgement and move the long-running workflow outside the call path.
  • Tool responses must use the same call_id that Inworld provided.
  • Empty tool argument strings are treated as {}; tools without parameters do not need placeholder JSON.
  • Tool-call turns are silent until the function output is sent back and the follow-up response is created.
  • The hang_up tool waits for ResponseOutputAudioDone when Inworld produces audio, with a completed ResponseDone fallback for final responses that complete without audio output.
  • For the full provider event schema, see the Inworld module reference.

Full VoxEngine scenario

voxeengine-inworld-function-calling.js
1/**
2 * Voximplant + Inworld Realtime API demo
3 * Scenario: answer an incoming call and handle Inworld function calls.
4 *
5 * Configure this in the Voximplant application:
6 * - Secret `INWORLD_API_KEY` (Voximplant Secrets)
7 */
8
9require(Modules.Inworld);
10
11const WEATHER_TOOL = "get_weather";
12const HANGUP_TOOL = "hang_up";
13
14const SYSTEM_PROMPT = `
15You are Voxi, a Voximplant developer advocate on a live phone call.
16Voximplant is pronounced VOX-im-plant.
17Keep answers short and natural.
18
19If the caller asks about weather, call get_weather.
20If the caller wants to end the call, say a brief goodbye and call hang_up.
21When you call a tool, do not speak before the tool result is available.
22
23Voice style:
24- Sound like an expressive product expert, not a flat IVR.
25- Use short, human turns.
26- Use at most one TTS-2 non-verbal tag per turn, and often none: [laugh], [breathe], [sigh], [clear throat].
27- Use at most one [speak ...] steering tag per turn. If used, it must be first.
28`;
29
30const SESSION_CONFIG = {
31 session: {
32 type: "realtime",
33 model: "claude-sonnet-4-6",
34 instructions: SYSTEM_PROMPT,
35 output_modalities: ["audio", "text"],
36 audio: {
37 input: {
38 transcription: {
39 model: "inworld/inworld-stt-1",
40 prompt: "Important terms: Voximplant, VoxEngine, Inworld, weather, San Francisco.",
41 },
42 turn_detection: {
43 type: "semantic_vad",
44 eagerness: "high",
45 create_response: true,
46 interrupt_response: true,
47 },
48 },
49 output: {
50 voice: "Ashley",
51 model: "inworld-tts-2",
52 },
53 },
54 providerData: {
55 tts: {
56 delivery_mode: "BALANCED",
57 },
58 },
59 tools: [
60 {
61 type: "function",
62 name: WEATHER_TOOL,
63 description: "Get current weather for a location.",
64 parameters: {
65 type: "object",
66 properties: {
67 location: {
68 type: "string",
69 description: "City name, for example San Francisco.",
70 },
71 },
72 required: ["location"],
73 },
74 },
75 {
76 type: "function",
77 name: HANGUP_TOOL,
78 description: "Hang up the current call.",
79 parameters: {
80 type: "object",
81 properties: {},
82 required: [],
83 },
84 },
85 ],
86 tool_choice: "auto",
87 },
88};
89
90VoxEngine.addEventListener(AppEvents.CallAlerting, async ({call}) => {
91 let voiceAIClient;
92 let pendingHangup = false;
93 const functionCallsByItemId = {};
94
95 // Helper to clean-up the call when done
96 const terminate = (event) => {
97 if (event) Logger.write(JSON.stringify(event));
98 voiceAIClient?.close();
99 VoxEngine.terminate();
100 };
101
102 // Termination handlers.
103 call.addEventListener(CallEvents.Disconnected, terminate);
104 call.addEventListener(CallEvents.Failed, terminate);
105
106 try {
107 call.answer();
108
109 voiceAIClient = await Inworld.createRealtimeAPIClient({
110 apiKey: VoxEngine.getSecretValue("INWORLD_API_KEY"),
111 sessionKey: `inworld-tools-${Date.now()}`,
112 onWebSocketClose: terminate,
113 });
114
115 voiceAIClient.addEventListener(Inworld.RealtimeAPIEvents.SessionCreated, () => {
116 Logger.write("===Inworld.SessionCreated===");
117 voiceAIClient.sessionUpdate(SESSION_CONFIG);
118 });
119
120 // Once the session is configured, bridge call media and trigger the greeting.
121 voiceAIClient.addEventListener(Inworld.RealtimeAPIEvents.SessionUpdated, () => {
122 Logger.write("===Inworld.SessionUpdated===");
123 // Bridge media between the call and Inworld Realtime.
124 VoxEngine.sendMediaBetween(call, voiceAIClient);
125 voiceAIClient.conversationItemCreate({
126 item: {
127 type: "message",
128 role: "user",
129 content: [
130 {
131 type: "input_text",
132 text: "The phone call just connected. Say only: Hi, this is Voxi. I can check the weather.",
133 },
134 ],
135 },
136 });
137 voiceAIClient.responseCreate({
138 response: {
139 output_modalities: ["audio", "text"],
140 },
141 });
142 });
143
144 voiceAIClient.addEventListener(Inworld.RealtimeAPIEvents.InputAudioBufferSpeechStarted, () => {
145 Logger.write("===BARGE-IN: Inworld.InputAudioBufferSpeechStarted===");
146 voiceAIClient.outputAudioBufferClear({});
147 });
148
149 voiceAIClient.addEventListener(Inworld.RealtimeAPIEvents.ResponseOutputItemAdded, (event) => {
150 const payload = event?.data?.payload || event?.data || {};
151 const item = payload.item;
152 if (item?.type !== "function_call") return;
153
154 functionCallsByItemId[item.id] = {
155 name: item.name,
156 call_id: item.call_id,
157 };
158 });
159
160 voiceAIClient.addEventListener(
161 Inworld.RealtimeAPIEvents.ResponseFunctionCallArgumentsDone,
162 (event) => {
163 const payload = event?.data?.payload || event?.data || {};
164 const {item_id: itemId, arguments: rawArgs} = payload;
165 const functionCall = functionCallsByItemId[itemId];
166 const toolName = functionCall?.name;
167 const toolCallId = functionCall?.call_id;
168
169 if (!toolName || !toolCallId) {
170 Logger.write("===TOOL_CALL_MISSING_FIELDS===");
171 Logger.write(JSON.stringify({payload, functionCall}));
172 return;
173 }
174
175 let args = {};
176 if (typeof rawArgs === "string") {
177 if (rawArgs.trim()) {
178 try {
179 args = JSON.parse(rawArgs);
180 } catch (error) {
181 Logger.write("===TOOL_ARGS_PARSE_ERROR===");
182 Logger.write(rawArgs);
183 Logger.write(error);
184 }
185 }
186 } else if (rawArgs && typeof rawArgs === "object") {
187 args = rawArgs;
188 }
189
190 Logger.write("===TOOL_CALL_RECEIVED===");
191 Logger.write(JSON.stringify({toolName, args}));
192
193 if (toolName === WEATHER_TOOL) {
194 const location = args.location || "San Francisco";
195 const result = {
196 location,
197 temperature_f: 72,
198 condition: "sunny",
199 };
200 voiceAIClient.conversationItemCreate({
201 item: {
202 type: "function_call_output",
203 call_id: toolCallId,
204 output: JSON.stringify(result),
205 },
206 });
207 Logger.write("===TOOL_RESPONSE_SENT===");
208 Logger.write(JSON.stringify(result));
209 voiceAIClient.responseCreate({
210 response: {
211 output_modalities: ["audio", "text"],
212 },
213 });
214 return;
215 }
216
217 if (toolName === HANGUP_TOOL) {
218 pendingHangup = true;
219 const result = {status: "hangup_pending"};
220 voiceAIClient.conversationItemCreate({
221 item: {
222 type: "function_call_output",
223 call_id: toolCallId,
224 output: JSON.stringify(result),
225 },
226 });
227 Logger.write("===TOOL_RESPONSE_SENT===");
228 Logger.write(JSON.stringify(result));
229 voiceAIClient.responseCreate({
230 response: {
231 output_modalities: ["audio", "text"],
232 },
233 });
234 return;
235 }
236
237 const result = {error: `Unhandled tool: ${toolName}`};
238 voiceAIClient.conversationItemCreate({
239 item: {
240 type: "function_call_output",
241 call_id: toolCallId,
242 output: JSON.stringify(result),
243 },
244 });
245 Logger.write("===TOOL_RESPONSE_SENT===");
246 Logger.write(JSON.stringify(result));
247 voiceAIClient.responseCreate({
248 response: {
249 output_modalities: ["audio", "text"],
250 },
251 });
252 },
253 );
254
255 // Let the agent finish speaking the goodbye before hanging up when possible.
256 voiceAIClient.addEventListener(Inworld.RealtimeAPIEvents.ResponseOutputAudioDone, () => {
257 if (!pendingHangup) return;
258 Logger.write("===HANGUP_AFTER_AGENT_AUDIO===");
259 pendingHangup = false;
260 call.hangup();
261 });
262
263 // Fallback handler in case the response is missing audio and not caught above
264 voiceAIClient.addEventListener(Inworld.RealtimeAPIEvents.ResponseDone, (event) => {
265 const payload = event?.data?.payload || event?.data || {};
266 if (pendingHangup && payload?.response?.status === "completed") {
267 Logger.write("===HANGUP_AFTER_RESPONSE_DONE===");
268 pendingHangup = false;
269 call.hangup();
270 }
271 Logger.write(`===${event.name}===`);
272 if (event?.data) Logger.write(JSON.stringify(event.data));
273 });
274
275
276 // Consolidated log-only handlers for lifecycle, audio, and error debugging.
277 [
278 Inworld.RealtimeAPIEvents.ConversationItemInputAudioTranscriptionDelta,
279 Inworld.RealtimeAPIEvents.ConversationItemInputAudioTranscriptionCompleted,
280 Inworld.RealtimeAPIEvents.ResponseCreated,
281 Inworld.RealtimeAPIEvents.ResponseFunctionCallArgumentsDelta,
282 Inworld.RealtimeAPIEvents.ResponseOutputAudioDone,
283 Inworld.RealtimeAPIEvents.ResponseOutputAudioTranscriptDone,
284 Inworld.RealtimeAPIEvents.InputAudioBufferSpeechStopped,
285 Inworld.RealtimeAPIEvents.InputAudioBufferCommitted,
286 Inworld.RealtimeAPIEvents.InputAudioBufferCleared,
287 Inworld.RealtimeAPIEvents.OutputAudioBufferStarted,
288 Inworld.RealtimeAPIEvents.OutputAudioBufferStopped,
289 Inworld.RealtimeAPIEvents.OutputAudioBufferCleared,
290 Inworld.RealtimeAPIEvents.ConnectorInformation,
291 Inworld.RealtimeAPIEvents.HTTPResponse,
292 Inworld.RealtimeAPIEvents.Error,
293 Inworld.RealtimeAPIEvents.WebSocketError,
294 Inworld.RealtimeAPIEvents.Unknown,
295 Inworld.Events.WebSocketMediaStarted,
296 Inworld.Events.WebSocketMediaEnded,
297 ].forEach((eventName) => {
298 voiceAIClient.addEventListener(eventName, (event) => {
299 Logger.write(`===${event.name}===`);
300 if (event?.data) Logger.write(JSON.stringify(event.data));
301 });
302 });
303
304 } catch (error) {
305 Logger.write("===UNHANDLED_ERROR===");
306 terminate(error instanceof Error ? {message: error.message, stack: error.stack} : {error: String(error)});
307 }
308});