We launched a voice agent for a property management company in Christchurch last year. 14 units across 3 commercial buildings. The agent handled tenant maintenance requests — pipe burst at 2am, heating failures, lockouts.
Week one: it booked a plumber for a "leaking tap." The tenant had said "leaking roof." One word. $4,200 callout to the wrong trade.
That mistake taught us something you've probably learned the hard way too. You can't test a voice agent by talking to it 50 times and hoping you've covered everything.
The Problem with "Just Call It and See"
Here's how most teams test voice agents. A developer builds the agent. They call it 10, maybe 20 times. Then they ship it.
What they didn't test:
Every untested scenario is a live grenade. And your client's brand is holding it.
The Numbers That Made Us Change
If you're still testing manually, these numbers will sting:
Three weeks of manual testing. And we'd still find bugs in production. That's not a process. That's a prayer.
Sound familiar?

Manual testing: 60+ hours per agent. Batch testing: under 3 days.
Why Manual Testing Breaks Down
Three reasons. All human.
Inconsistency. Your tester at 9am is sharp. Your tester at 4pm on Friday is checking the clock. Same scenario, different results depending on when.
Conversation fatigue. After 50 calls, your brain starts skipping. You stop catching the subtle failures — the misleading response, the robotic redirect.
Scope paralysis. "What if they say this?" The edge cases multiply. You can't test them all manually. So you pick the ones you think matter and cross your fingers.
Hope isn't a testing strategy. You know this.
How Retell's Batch Simulation Testing Works
Retell built simulation testing into their platform. Here's what you're actually working with:
Step 1: Define Your Test Personas
You create simulated callers with specific traits. Not generic "angry customer" labels — actual characters:
You're building the callers your agent will face in production. The Christchurch landlord. The elderly tenant. The midnight emergency.
Step 2: Set Success Criteria
For each test case, you define what "success" looks like:
Binary pass/fail against measurable criteria. Your agent either nails it or it doesn't.
Step 3: Run the Batch
You queue 20+ scenarios simultaneously. Each one simulates a full conversation — greeting to close.
One click. 20 conversations. About 3 minutes. Compare that to your developer spending 2 hours on 20 phone calls.
Step 4: Debug and Iterate
Failed a test? Retell shows you exactly where. The platform gives you specific fixes:
Fix the failure. Re-run the batch. Verify you didn't break something else. Regression testing in minutes.

One day. Not three weeks.
What We Test Before Every Launch
Here's our standard test suite. Every agent goes through this before your customers hear a word:
Happy path scenarios (5-8 tests) — Caller follows the expected flow. Books an appointment. Gets their answer. Hangs up satisfied.
Objection handling (4-6 tests) — "Not interested." "How'd you get my number?" "Is this a scam?" Your agent needs natural redirects, not canned scripts.
Edge cases (6-10 tests) — Mumbled speech. Long pauses. Unexpected questions. If your agent handles the weird stuff, the normal stuff takes care of itself.
Compliance scenarios (3-5 tests) — Does the agent identify itself? Does it honour opt-out requests? Does it respect calling hour restrictions? One compliance failure can cost more than the entire campaign.
Transfer and escalation (2-4 tests) — When should the agent hand off to a human? Does it pass context correctly? See our guide on agent-to-agent transfer with full context.
The Before and After
Before batch testing (our first 6 months):
After batch testing (last 12 months):
Your developers actually enjoy the build-test-fix loop now. Imagine that.
That Christchurch property management agent? We rebuilt it with batch testing. 27 test scenarios covering every maintenance category and tenant persona.
Launched with zero critical issues. The "leaking roof" scenario? Test case #14. Passes every time now.
What Batch Testing Can't Do (Yet)
Being honest about the limitations you'll hit:
Conversation flow agents only. If your agent uses function calling for external APIs, batch simulation can't test those. You'll still need live testing for webhook-dependent flows.
No audio environment simulation. Background noise, poor signal, echoey rooms — the simulation doesn't replicate these. Something to keep in mind for your deployment.
Upfront investment. Writing 20+ detailed test personas takes time. Budget 4-6 hours for your first test suite. After that, you're adapting — about 30 minutes per new agent.
Still worth it? 4-6 hours upfront versus 60+ hours of manual testing. You do the maths.
How This Fits Our Build Process
Every agent we deploy at Waboom AI follows this cycle:
1. Build — Design your conversation flows with intelligent pathing
2. Batch test — Run your 20+ scenarios against success criteria
3. Fix — Debug failures, adjust your prompts and nodes
4. Re-test — Verify your fixes didn't break other scenarios
5. Live test — 50 real calls with your internal team
6. Deploy — Ship with confidence to your customers
Steps 2-4 happen in a single day. That's what separates you from teams still testing by phone.
Want voice agents that work on day one?
Frequently Asked Questions
What's the right number of test scenarios for a voice agent?
We run 20-30 scenarios per agent as a minimum. That covers 5-8 happy paths, 4-6 objection handlers, 6-10 edge cases, 3-5 compliance checks, and 2-4 transfer scenarios.
Your first test suite takes 4-6 hours to build. After that, you adapt and reuse — roughly 30 minutes per new agent.
Can batch testing replace manual testing entirely?
Not entirely. Batch simulation tests conversation logic and response quality. It can't simulate audio quality issues or webhook integrations.
We still do 50 live calls internally before every deployment. But batch testing eliminates about 95% of your manual burden.
What happens when a batch test fails?
Retell shows you exactly where the conversation broke. You get debug options: fine-tune responses, split complex nodes, adjust temperature, or regenerate with 10 variations.
Fix the issue, re-run the batch, verify you didn't break other scenarios. The whole cycle takes minutes.
Does batch testing work with all Retell agent types?
Currently, batch simulation supports conversation flow agents. Function-calling agents that rely on external APIs can't be fully tested in simulation.
For those, you still need live testing for the integration layer. The conversation logic itself can still be batch-tested.
Leonardo Garcia-Curtis
Founder & CEO at Waboom AI. Building voice AI agents that convert.
Ready to Build Your AI Voice Agent?
Let's discuss how Waboom AI can help automate your customer conversations.
Book a Free Demo









