Getting Started
Voximplant is both a Voice AI Orchestration platform (VAOIP) and a powerful Communications Platform as a Service (CPaaS) with extensive capabilities that include telephony and messaging capabilities with support for phone, Web, WhatsApp, and native mobile apps. This guide will focus on core capabilities most commonly used in Voice AI applications.
Before you start, you should have an idea of the following:
- What telephony networks are you using - phone numbers, WhatsApp, SIP, WebRTC?
- What LLM and Speech engine do you want to use?
- Do you need inbound or outbound calling?
See our Voice AI Frequently Asked Questions (FAQ) for more background on Voximplant and Voice AI.
Before we start, let’s review what VoxEngine is and how it works.
VoxEngine Overview
VoxEngine is Voximplant’s serverless environment that executes code to control the telephony experience and interact with external services. It is designed to be flexible for a wide variety of call flows and use cases.
Below are some quick highlights from the VoxImplant of the environment. We dive deeper on these topics in the Voximplant Configuration section after this.
- Applications: top-level container for environments and resources.
- Scenarios: JavaScript call-control programs executed by VoxEngine.
- Routing rules: destination matching and scenario selection.
- Phone numbers: PSTN entry points for inbound and outbound calling.
- Users: app/SIP-style endpoints for web and native calling.
- Application Storage: key-value store for API keys and configuration.
Applications
Applications are the top-level container for your project. Everything discussed below needs to be associated with an application.
Scenarios
These are simply JavaScript-based programs and sub-programs that you specify. These run automatically whenever there is an incoming call or you choose to initiate them.
To get started, we recommend you copy the scenario code from one of the Voice AI Connector guides, or just use a basic hello world example like the one below.
Routing rules
Routing rules execute one or more scenarios. For incoming calls, you define a pattern (based on regular expressions) that matches the phone number, SIP address, or a user name. You can also initiate these via API.
To start you just need to create one with a universal pattern match: .*.
Phone numbers
The VoxEngine portal and APIs let you purchase and assign phone numbers to applications. If you want to give your application a phone number or dial out to a phone number, you will need to set one of these up.
Setting a phone number is completely optional, but most applications include one.
Users
In addition to phone numbers, VoxEngine also allows you to place and receive calls from users that you define on the platform. If you are just using phone numbers then users are optional. Users are needed for some SIP interactions, web calling, and using our native client SDKs.
Application Storage
Voximplant provides a key value storage system that you can use for environment-style variables and temporary storage.
What LLM and Speech engine do you want to use?
Voximplant offers a number of built-in Voice AI connectors that connect with popular Voice AI platforms - Cartesia, Deepgram, ElevenLabs, Google Gemini, Grok, OpenAI, Ultravox, etc. These connectors use a WebSocket interface to integrate directly with these environments as a client. There are even mechanisms to build your own connector.
You will need an API key from the Voice AI platform, which means you will need to already have an account with developer access setup.
All of these options offer speech-to-speech capabilities. Some platforms handle speech natively (OpenAI, Gemini, Grok, Ultravox for input) while some allow you to select the speech provider(s) from within their environments (Cartesia, Deepgram, ElevenLabs,Ultravox for output). Alternatively you can implement half-cascade architectures where you use Voximplant’ controlled Text-to-Speech (TTS). Full cascade architectures where you manage Speech-to-Text (STT), the LLM, and TTS are also possible.
We recommend starting with one of our Voice AI connectors in a speech-to-speech pipeline.
What telephony networks are you using?
Voximplant supports calling the following networks and technology:
- Public Switched Telephone Network (PSTN) - outbound calling to more than 230 countries and territories and phone numbers from more than 100 countries
- Session Initiation Protocol (SIP) - calling with existing SIP infrastructure and SIP trunking providers
- WhatsApp - calling from an existing WhatsApp Business account
- Web - use the Web SDK to call using WebRTC
- Native mobile apps - connect to users directly on mobile and other platforms using one of our many media client SDKs
We recommend starting with phone numbers and inbound PSTN calling.
Using these Guides
To get started, see the Configure Voximplant and Inbound vs. Outbound calling sections that follow. Once you know the Voice AI connector you want to use and the call flow direction you want to use, see our Voice AI Connector guides:
➡️ https://docs.voximplant.ai/voice-ai-connectors/
Copy the full example code in each guide. Remember you will also need to populate Application Storage with any API keys mentioned in the prerequisites (or add them to the code).