Waboom AI
AI Training
AI Automation
AI Voice Agents
Resources
Contact
09 888 0402
Reference

Voice AI Glossary

69+ voice AI terms in plain English. Covers ASR, TTS, latency, barge-in, RAG, carrier intelligence, NZ + AU compliance, and the te reo place names that catch out generic AI agents.

Jump to letter:

ABCDEFGHIJKLMNOPRSTUVWXZ

A

ACMA
Australian Communications and Media Authority. The regulator that enforces the Spam Act 2003 and the Industry Standard 2017 (telemarketing and research calling hours). The body you do not want investigating your campaign.
AI Voice Agent
Software that answers or makes phone calls using a Large Language Model and a synthetic voice. Modern agents qualify, book, transfer, and capture data. Waboom AI builds and operates these in production for NZ and AU businesses.
ASR
Automatic Speech Recognition. The component that turns the caller's audio into text the LLM can read. Whisper, Deepgram, and Google STT are the common engines. Quality of the ASR is most of the conversation quality.
Audio Bitrate
How many bits per second of audio you send over the line. PSTN is 8 kHz mono at around 64 kbit/s. Modern voice AI uses 16 to 24 kHz for better consonant capture, then downsamples for the carrier.
AusPost Receptionist
Common search term. A human or AI agent who triages package and delivery enquiries. Increasingly replaced by AI for the first 30 seconds (delivery slot, address, missed package) before transfer to human.

B

Barge-in
The caller speaking over the agent and the agent stopping mid-sentence to listen. Without barge-in the conversation feels like talking to a kiosk. Waboom agents barge-in within 200 ms of detecting a new utterance.
Bullhorn
Recruitment CRM widely used in AU and NZ. Waboom voice agents push candidate screening summaries directly into Bullhorn within 60 seconds of the call ending.

C

Call Tags
Structured labels the agent attaches to a call (HOT_LEAD, COMPLAINT, BOOKED, NOT_INTERESTED). Used to route post-call workflows, alerts, and campaign segmentation. Covered in our blog on call tagging.
Carrier Intelligence
The discipline of knowing how a phone carrier rates and rotates your numbers. Includes Spam Likely scoring, warmup patterns, and rest cycles. The reason your dialler numbers do or do not stay alive.
Cliniko
Healthcare practice management system used by NZ + AU clinics. Waboom integrates for booking, recall, and patient record lookup.
Compliance Recording Disclosure
The legal requirement to tell a caller they are being recorded before recording starts. Required under NZ Crimes Act s216A-216C and APP 5. Built into every Waboom agent by default.
Connect Rate
Percentage of dialled numbers that result in a live conversation. Industry baseline is 8 to 12 percent. Waboom typically holds 47 to 65 percent on warm lists.
CSAT
Customer Satisfaction Score. Usually a 1 to 5 rating captured at end of call. AI-handled calls now match or exceed human-handled CSAT in our deployments, especially for booking and intake calls.

D

Deepgram
ASR provider known for low-latency English transcription. One of the engines we use depending on language and latency budget.
Diarization
The process of identifying who said what in a recording. Critical for two-party calls. Waboom diarisation is per-channel for clean separation.
DNC Register
Do Not Call Register. NZ has the Marketing Association DNC list. AU has the federal DNC Register Act 2006. Waboom honours both automatically and refreshes the list weekly.
DTMF
Dual-Tone Multi-Frequency. The keypad tones (press 1 for sales). AI agents often disable IVR menus in favour of natural language but still capture DTMF when an integration requires it.

E

ElevenLabs
Voice cloning and TTS provider. Strong for English and 30+ other languages. Used by Waboom for premium voice deployments where voice quality matters more than per-minute cost.
EOFY
End of Financial Year. 30 June in Australia, 31 March in New Zealand. The week your accountants and tax agents experience an 8x call surge. Common AI voice agent deployment trigger.
ezyVet
Veterinary practice management system widely used in NZ + AU. Waboom integrates for emergency triage, booking, and prescription refills.

F

FAQ Schema
JSON-LD structured data marking up question-answer pairs on a page. Helps the page show up as a rich result in Google and as a citation in LLMs (ChatGPT, Perplexity, Claude). Every Waboom industry page ships with FAQ schema.
First Token Latency
Time between the user finishing speaking and the agent starting to respond. Industry leaders sit around 800 ms. Anything over 1.5 seconds and the caller starts repeating themselves.
Function Calling
The mechanism by which an LLM triggers an external tool (book a meeting, lookup a record, send an SMS). The reason a voice agent can actually do things instead of just talking.

G

GoHighLevel
All-in-one CRM and marketing platform popular with agencies. Waboom integrates voice agents directly into the GHL pipeline so booked calls become opportunities.
GDPR
General Data Protection Regulation. The EU framework that the NZ Privacy Act 2020 and the Australian Privacy Principles broadly align with. Relevant if you have any EU-resident customers.

H

Hangup Recovery
The agent detecting a sudden silence (caller hung up unexpectedly) and re-dialling within 30 seconds with a recovery script. Cuts effective drop-off rate by 15 to 25 percent on outbound campaigns.
HVAC
Heating, Ventilation, Air Conditioning. The trade with the most pronounced seasonal call surge (summer heatwaves, winter cold snaps). Frequent AI voice agent deployment industry.

I

IPP
Information Privacy Principle. The 13 principles in the NZ Privacy Act 2020. We covered each one in our IPP-by-IPP compliance guide.
Intent Detection
The classification step where the agent decides what the caller wants (book, complain, ask a question, transfer to human). Determines which workflow runs next.
IVR
Interactive Voice Response. The press-1-for-sales menu. Modern AI voice agents replace IVRs with natural language for better conversion and lower abandonment.

J

Jitter
Variation in packet delay over the network. High jitter causes choppy audio and dropped words. Voice AI providers measure jitter end to end and trigger fallbacks when it exceeds 30 ms.
JobAdder
Recruitment CRM dominant in AU and NZ. Waboom integrates for candidate screening intake and client-side role briefing.

K

Karbon
Practice management software for accounting firms. Waboom integrates for EOFY appointment booking and client intake.
Knowledge Base
The corpus of company-specific information the agent draws on (services, prices, policies, FAQs). Updated by self-service or by sync. The reason your agent sounds like your business and not generic.

L

Latency
End-to-end time from caller speaking to agent responding. Sum of ASR, LLM thinking, function execution, TTS generation, and network transit. The most-watched metric in voice AI.
LLM
Large Language Model. The brain that decides what the agent says. Anthropic Claude, OpenAI GPT-4, Google Gemini are the dominant production engines. Waboom picks per-deployment based on language, latency, and cost.

M

MCP
Model Context Protocol. Anthropic-led standard for connecting LLMs to external tools and data sources. Used in Waboom Claude Code workshops and increasingly in production agents.
Multilingual Voice
An agent that detects the caller's language in the first 5 seconds and continues in it. Te Reo Māori, Mandarin, Hindi, Filipino, Spanish, 30+ more. Critical for tourist towns and trades recruiting.

N

Neural TTS
Text-to-speech using deep neural networks instead of concatenative or formant synthesis. Sounds dramatically more natural. The standard since 2020.
Noise Cancellation
Removing background noise from the caller side audio so ASR works on a clean signal. Critical for trades calls (job site noise) and hospitality (restaurant noise).

O

OAIC
Office of the Australian Information Commissioner. The regulator for the Australian Privacy Principles. Investigates breaches, can fine for serious or repeated APP contraventions (now up to $50M AUD per breach).
Outbound Dialler
Software that initiates calls (vs accepting them). AI outbound diallers run pacing algorithms, list rotation, and DNC honour. The discipline that separates an effective campaign from a burned phone number.

P

Privacy Act 2020
The NZ data protection law in force from 1 December 2020. 13 IPPs cover collection, storage, use, access, correction, retention, and breach notification. New IPP 3A on AI-driven decisions kicks in 1 May 2026.
Pronunciation Dictionary
Lookup table for words the TTS engine would otherwise mispronounce. Essential for NZ place names (Whangārei, Rotorua, Tauranga), te reo, and unusual surnames. Covered in our pronunciation blog.
Prosody
The rhythm, stress, and intonation of speech. Modern neural TTS gets prosody right most of the time. Bad prosody is the most common reason a voice still sounds robotic.

R

RAG
Retrieval Augmented Generation. The architecture where the agent looks up relevant information from your knowledge base before responding. The reason an agent can quote your current pricing instead of training-era pricing.
Realtime
Sub-second response latency for full duplex audio. The OpenAI Realtime API and the Anthropic Claude voice mode are the two leading realtime engines as of 2026.
Recording Disclosure
The notice given at the start of a call that the conversation is being recorded. Required under NZ Crimes Act s216A-216C and Australian Privacy Principle 5. Default in every Waboom agent.
Retell AI
Voice agent infrastructure provider. One of the conversational engines Waboom uses underneath the platform layer. We covered the architecture in our RAG-powered voice agent blog.

S

Sentiment Analysis
Realtime classification of caller emotion (positive, neutral, negative, frustrated). Used to fire alerts when a call is going wrong, before the caller hangs up.
Smart Booster
Waboom feature that warms a fresh phone number gradually so the carrier reputation stays clean. Without it, a brand-new outbound number gets flagged Spam Likely within 100 calls.
Spam Act 2003
AU federal law governing commercial electronic messages. Carves voice calls out (s5(3)), so it does not apply to phone-based outbound. Often confused with the Spam Act, which is what makes confusion possible.
Spam Likely
The carrier label that appears on the recipient's caller ID when your number has been flagged. Once labelled, your connect rate drops 60 to 80 percent. Recovery requires number rest, reputation rebuild, or a new number.
STT
Speech to Text. Synonym for ASR. Used interchangeably.

T

Te Reo Māori
The indigenous language of Aotearoa / New Zealand. Spoken by 4.6 percent of the population (2021 Stats NZ). Modern voice agents pronounce te reo place names and basic phrases correctly when configured with a pronunciation dictionary.
TTFB
Time To First Byte. In voice AI, the time between the user finishing their sentence and the first audio packet being emitted by the agent. Best-in-class is around 800 ms.
TTS
Text To Speech. The final stage that converts the LLM's response text into spoken audio. ElevenLabs, OpenAI, Cartesia, and Coqui XTTS are the dominant engines.
Turn Taking
The model that decides when the caller is done speaking and the agent should respond. Bad turn-taking causes interruptions or awkward silences. The most under-appreciated voice quality factor.
Twilio
Telephony infrastructure provider. Most voice agent platforms (including Waboom) use Twilio for the carrier connection layer.

U

Utterance
A single contiguous spoken segment from the user, ending in a pause. The unit that ASR transcribes and the LLM responds to. Long utterances (more than 15 seconds) typically need mid-utterance acknowledgement.

V

VAD
Voice Activity Detection. The component that decides whether incoming audio contains speech or just background noise. Bad VAD wastes ASR cycles on hold-music or makes the agent miss soft-spoken callers.
Vapi
Voice agent infrastructure provider. Competitor and complement to Retell AI. Some Waboom deployments run on Vapi for specific feature requirements.
VetLink
Veterinary practice management system used by NZ vet clinics. Waboom integrates for emergency triage and booking.
Voice Cloning
Synthesising a new voice that sounds like a specific person from a sample of their speech. ElevenLabs and Cartesia are the leading providers. Used for branded agent voices and accessibility.

W

Whangārei
Northland city. Pronounced 'Fong-AH-ray', not 'Wong-uh-RAY'. Common AI voice agent failure mode without a pronunciation dictionary.
Whisper
OpenAI's open-source ASR model. Strong multilingual coverage. Used by Waboom for high-accuracy non-English transcription.
Webhook
HTTP callback used to integrate the voice agent with external systems. Waboom posts call summaries, tags, and recordings to client webhooks within seconds of call end.

X

XTTS
Coqui's cross-lingual text-to-speech model. Open source. Good for languages where commercial TTS support is thin. Used in some Waboom deployments for niche languages.

Z

Zero Retention
A data architecture where call recordings, transcripts, and PII are deleted immediately after processing. Default for sensitive sectors (legal, health) and configurable per Waboom deployment. Covered in our zero-retention blog.

See These Terms in Production

Theory is one thing. Watching ASR, TTS, RAG, call tags, and Spam Likely all working on a real call is another. Bring your call mix, we run a live demo on it.

Book a Strategy CallSee Pricing

Last updated 1 May 2026 · by Leonardo Garcia-Curtis

Waboom AI

Empowering New Zealand and Australian businesses with AI voice agents and automation that deliver real, measurable value.

hello@waboom.ai+64 9 888 0402
Level 8, 139 Quay Street
Auckland CBD, New Zealand

Voice Agents

  • AI Voice Agents
  • Voice Agent Pricing
  • Listen to Voices
  • Voice Agent Demos
  • Real Estate Voice Agents
  • Real Estate Guide

Workshops

  • AI Team Training
  • AI Strategy Workshop
  • AI Champion Workshop
  • Claude Team Training
  • Claude Code Workshop
  • Lovable Workshop
  • Free AI Workshop

Automation

  • AI Automation
  • Microsoft Copilot Agents

Company

  • About Us
  • Contact
  • Partners
  • Resources
  • Blog

Powered by leading AI technologies

VAPIRetell AIOpenAIZapierMakeStripe

© 2026 Waboom.ai. All rights reserved.

PrivacyTermsSecurity