Reference
Voice AI Glossary
69+ voice AI terms in plain English. Covers ASR, TTS, latency, barge-in, RAG, carrier intelligence, NZ + AU compliance, and the te reo place names that catch out generic AI agents.
A
- ACMA
- Australian Communications and Media Authority. The regulator that enforces the Spam Act 2003 and the Industry Standard 2017 (telemarketing and research calling hours). The body you do not want investigating your campaign.
- AI Voice Agent
- Software that answers or makes phone calls using a Large Language Model and a synthetic voice. Modern agents qualify, book, transfer, and capture data. Waboom AI builds and operates these in production for NZ and AU businesses.
- ASR
- Automatic Speech Recognition. The component that turns the caller's audio into text the LLM can read. Whisper, Deepgram, and Google STT are the common engines. Quality of the ASR is most of the conversation quality.
- Audio Bitrate
- How many bits per second of audio you send over the line. PSTN is 8 kHz mono at around 64 kbit/s. Modern voice AI uses 16 to 24 kHz for better consonant capture, then downsamples for the carrier.
- AusPost Receptionist
- Common search term. A human or AI agent who triages package and delivery enquiries. Increasingly replaced by AI for the first 30 seconds (delivery slot, address, missed package) before transfer to human.
B
- Barge-in
- The caller speaking over the agent and the agent stopping mid-sentence to listen. Without barge-in the conversation feels like talking to a kiosk. Waboom agents barge-in within 200 ms of detecting a new utterance.
- Bullhorn
- Recruitment CRM widely used in AU and NZ. Waboom voice agents push candidate screening summaries directly into Bullhorn within 60 seconds of the call ending.
C
- Carrier Intelligence
- The discipline of knowing how a phone carrier rates and rotates your numbers. Includes Spam Likely scoring, warmup patterns, and rest cycles. The reason your dialler numbers do or do not stay alive.
- Cliniko
- Healthcare practice management system used by NZ + AU clinics. Waboom integrates for booking, recall, and patient record lookup.
- Compliance Recording Disclosure
- The legal requirement to tell a caller they are being recorded before recording starts. Required under NZ Crimes Act s216A-216C and APP 5. Built into every Waboom agent by default.
- Connect Rate
- Percentage of dialled numbers that result in a live conversation. Industry baseline is 8 to 12 percent. Waboom typically holds 47 to 65 percent on warm lists.
- CSAT
- Customer Satisfaction Score. Usually a 1 to 5 rating captured at end of call. AI-handled calls now match or exceed human-handled CSAT in our deployments, especially for booking and intake calls.
D
- Deepgram
- ASR provider known for low-latency English transcription. One of the engines we use depending on language and latency budget.
- Diarization
- The process of identifying who said what in a recording. Critical for two-party calls. Waboom diarisation is per-channel for clean separation.
- DNC Register
- Do Not Call Register. NZ has the Marketing Association DNC list. AU has the federal DNC Register Act 2006. Waboom honours both automatically and refreshes the list weekly.
- DTMF
- Dual-Tone Multi-Frequency. The keypad tones (press 1 for sales). AI agents often disable IVR menus in favour of natural language but still capture DTMF when an integration requires it.
E
- ElevenLabs
- Voice cloning and TTS provider. Strong for English and 30+ other languages. Used by Waboom for premium voice deployments where voice quality matters more than per-minute cost.
- EOFY
- End of Financial Year. 30 June in Australia, 31 March in New Zealand. The week your accountants and tax agents experience an 8x call surge. Common AI voice agent deployment trigger.
- ezyVet
- Veterinary practice management system widely used in NZ + AU. Waboom integrates for emergency triage, booking, and prescription refills.
F
- FAQ Schema
- JSON-LD structured data marking up question-answer pairs on a page. Helps the page show up as a rich result in Google and as a citation in LLMs (ChatGPT, Perplexity, Claude). Every Waboom industry page ships with FAQ schema.
- First Token Latency
- Time between the user finishing speaking and the agent starting to respond. Industry leaders sit around 800 ms. Anything over 1.5 seconds and the caller starts repeating themselves.
- Function Calling
- The mechanism by which an LLM triggers an external tool (book a meeting, lookup a record, send an SMS). The reason a voice agent can actually do things instead of just talking.
G
- GoHighLevel
- All-in-one CRM and marketing platform popular with agencies. Waboom integrates voice agents directly into the GHL pipeline so booked calls become opportunities.
- GDPR
- General Data Protection Regulation. The EU framework that the NZ Privacy Act 2020 and the Australian Privacy Principles broadly align with. Relevant if you have any EU-resident customers.
H
- Hangup Recovery
- The agent detecting a sudden silence (caller hung up unexpectedly) and re-dialling within 30 seconds with a recovery script. Cuts effective drop-off rate by 15 to 25 percent on outbound campaigns.
- HVAC
- Heating, Ventilation, Air Conditioning. The trade with the most pronounced seasonal call surge (summer heatwaves, winter cold snaps). Frequent AI voice agent deployment industry.
I
- IPP
- Information Privacy Principle. The 13 principles in the NZ Privacy Act 2020. We covered each one in our IPP-by-IPP compliance guide.
- Intent Detection
- The classification step where the agent decides what the caller wants (book, complain, ask a question, transfer to human). Determines which workflow runs next.
- IVR
- Interactive Voice Response. The press-1-for-sales menu. Modern AI voice agents replace IVRs with natural language for better conversion and lower abandonment.
J
- Jitter
- Variation in packet delay over the network. High jitter causes choppy audio and dropped words. Voice AI providers measure jitter end to end and trigger fallbacks when it exceeds 30 ms.
- JobAdder
- Recruitment CRM dominant in AU and NZ. Waboom integrates for candidate screening intake and client-side role briefing.
K
- Karbon
- Practice management software for accounting firms. Waboom integrates for EOFY appointment booking and client intake.
- Knowledge Base
- The corpus of company-specific information the agent draws on (services, prices, policies, FAQs). Updated by self-service or by sync. The reason your agent sounds like your business and not generic.
L
- Latency
- End-to-end time from caller speaking to agent responding. Sum of ASR, LLM thinking, function execution, TTS generation, and network transit. The most-watched metric in voice AI.
- LLM
- Large Language Model. The brain that decides what the agent says. Anthropic Claude, OpenAI GPT-4, Google Gemini are the dominant production engines. Waboom picks per-deployment based on language, latency, and cost.
M
- MCP
- Model Context Protocol. Anthropic-led standard for connecting LLMs to external tools and data sources. Used in Waboom Claude Code workshops and increasingly in production agents.
- Multilingual Voice
- An agent that detects the caller's language in the first 5 seconds and continues in it. Te Reo Māori, Mandarin, Hindi, Filipino, Spanish, 30+ more. Critical for tourist towns and trades recruiting.
N
- Neural TTS
- Text-to-speech using deep neural networks instead of concatenative or formant synthesis. Sounds dramatically more natural. The standard since 2020.
- Noise Cancellation
- Removing background noise from the caller side audio so ASR works on a clean signal. Critical for trades calls (job site noise) and hospitality (restaurant noise).
O
- OAIC
- Office of the Australian Information Commissioner. The regulator for the Australian Privacy Principles. Investigates breaches, can fine for serious or repeated APP contraventions (now up to $50M AUD per breach).
- Outbound Dialler
- Software that initiates calls (vs accepting them). AI outbound diallers run pacing algorithms, list rotation, and DNC honour. The discipline that separates an effective campaign from a burned phone number.
P
- Privacy Act 2020
- The NZ data protection law in force from 1 December 2020. 13 IPPs cover collection, storage, use, access, correction, retention, and breach notification. New IPP 3A on AI-driven decisions kicks in 1 May 2026.
- Pronunciation Dictionary
- Lookup table for words the TTS engine would otherwise mispronounce. Essential for NZ place names (Whangārei, Rotorua, Tauranga), te reo, and unusual surnames. Covered in our pronunciation blog.
- Prosody
- The rhythm, stress, and intonation of speech. Modern neural TTS gets prosody right most of the time. Bad prosody is the most common reason a voice still sounds robotic.
R
- RAG
- Retrieval Augmented Generation. The architecture where the agent looks up relevant information from your knowledge base before responding. The reason an agent can quote your current pricing instead of training-era pricing.
- Realtime
- Sub-second response latency for full duplex audio. The OpenAI Realtime API and the Anthropic Claude voice mode are the two leading realtime engines as of 2026.
- Recording Disclosure
- The notice given at the start of a call that the conversation is being recorded. Required under NZ Crimes Act s216A-216C and Australian Privacy Principle 5. Default in every Waboom agent.
- Retell AI
- Voice agent infrastructure provider. One of the conversational engines Waboom uses underneath the platform layer. We covered the architecture in our RAG-powered voice agent blog.
S
- Sentiment Analysis
- Realtime classification of caller emotion (positive, neutral, negative, frustrated). Used to fire alerts when a call is going wrong, before the caller hangs up.
- Smart Booster
- Waboom feature that warms a fresh phone number gradually so the carrier reputation stays clean. Without it, a brand-new outbound number gets flagged Spam Likely within 100 calls.
- Spam Act 2003
- AU federal law governing commercial electronic messages. Carves voice calls out (s5(3)), so it does not apply to phone-based outbound. Often confused with the Spam Act, which is what makes confusion possible.
- Spam Likely
- The carrier label that appears on the recipient's caller ID when your number has been flagged. Once labelled, your connect rate drops 60 to 80 percent. Recovery requires number rest, reputation rebuild, or a new number.
- STT
- Speech to Text. Synonym for ASR. Used interchangeably.
T
- Te Reo Māori
- The indigenous language of Aotearoa / New Zealand. Spoken by 4.6 percent of the population (2021 Stats NZ). Modern voice agents pronounce te reo place names and basic phrases correctly when configured with a pronunciation dictionary.
- TTFB
- Time To First Byte. In voice AI, the time between the user finishing their sentence and the first audio packet being emitted by the agent. Best-in-class is around 800 ms.
- TTS
- Text To Speech. The final stage that converts the LLM's response text into spoken audio. ElevenLabs, OpenAI, Cartesia, and Coqui XTTS are the dominant engines.
- Turn Taking
- The model that decides when the caller is done speaking and the agent should respond. Bad turn-taking causes interruptions or awkward silences. The most under-appreciated voice quality factor.
- Twilio
- Telephony infrastructure provider. Most voice agent platforms (including Waboom) use Twilio for the carrier connection layer.
U
- Utterance
- A single contiguous spoken segment from the user, ending in a pause. The unit that ASR transcribes and the LLM responds to. Long utterances (more than 15 seconds) typically need mid-utterance acknowledgement.
V
- VAD
- Voice Activity Detection. The component that decides whether incoming audio contains speech or just background noise. Bad VAD wastes ASR cycles on hold-music or makes the agent miss soft-spoken callers.
- Vapi
- Voice agent infrastructure provider. Competitor and complement to Retell AI. Some Waboom deployments run on Vapi for specific feature requirements.
- VetLink
- Veterinary practice management system used by NZ vet clinics. Waboom integrates for emergency triage and booking.
- Voice Cloning
- Synthesising a new voice that sounds like a specific person from a sample of their speech. ElevenLabs and Cartesia are the leading providers. Used for branded agent voices and accessibility.
W
- Whangārei
- Northland city. Pronounced 'Fong-AH-ray', not 'Wong-uh-RAY'. Common AI voice agent failure mode without a pronunciation dictionary.
- Whisper
- OpenAI's open-source ASR model. Strong multilingual coverage. Used by Waboom for high-accuracy non-English transcription.
- Webhook
- HTTP callback used to integrate the voice agent with external systems. Waboom posts call summaries, tags, and recordings to client webhooks within seconds of call end.
X
- XTTS
- Coqui's cross-lingual text-to-speech model. Open source. Good for languages where commercial TTS support is thin. Used in some Waboom deployments for niche languages.
Z
- Zero Retention
- A data architecture where call recordings, transcripts, and PII are deleted immediately after processing. Default for sensitive sectors (legal, health) and configurable per Waboom deployment. Covered in our zero-retention blog.
See These Terms in Production
Theory is one thing. Watching ASR, TTS, RAG, call tags, and Spam Likely all working on a real call is another. Bring your call mix, we run a live demo on it.