It's day 14. Your shiny new outbound voice agent is live on a 1,400-contact list. You scheduled a campaign run for 9am.
By 11am, the agent has called 480 contacts. Your phone reputation is in pieces.
Three things broke. None of them showed up in the demo.
Your CRM webhook has a silent rate limit at 60 requests per minute. The agent burned through it by call 90. The next 390 calls dispositioned to "unknown" because nothing wrote back.
Your concurrent call cap was 10. The agent queued 48 simultaneously. 38 of them dropped on busy tones. Your number is now flagged "high spam risk" by two carriers.
Your script branch for "customer says they're driving and can't talk" wasn't there. The agent kept pushing through. Three customers complained to the marketing manager.
By lunchtime your team's pulled the campaign. Your phone reputation will take three weeks to recover. Maybe.
This is the failure mode every outbound voice agent operator hits the first time. Almost all of it would have been caught by a 60-call pilot.
You skipped the pilot because the demo looked great.
The pattern: pilot, soak, scale
This is Playbook 5 from the voice agent column we have been running.
Five playbooks beyond the AI receptionist. This one's the boring one. The one most operators in your space skip.
It's also the one that separates the operators who ship voice agents from the ones who blow up their phone reputation in week 2.
Three sizes you walk through, in order:
60 calls. Controlled list you've hand-picked, low-stakes.
600 calls. Real list, real customers, real outcomes.
6,000 calls. Full deployment.
Most operators jump straight from "demo with my mate" to "1,400-contact campaign". The middle two stages are where your failures live.
Same agent on your line. Same minute pricing. Same 800ms first-token reply. New discipline.
Why your demo looked great and production blew up
Me: "Why didn't the demo catch it?"
Builder: "Demo runs 10 calls. Production runs 600 a day."
Me: "And the orchestration layer?"
Builder: "Falls over at 80 concurrent."
A real conversation with a real builder. He'd just finished the rebuild after his first production run torched his client's number reputation.
Three things only break under load.
Rate limits. Every API your agent talks to has one.
The CRM webhook. The transactional email provider. The SMS gateway.
Stripe. The phone provider's outbound trunk.
Concurrency caps. Most NZ phone providers cap free tiers at 10 to 20 simultaneous calls. The moment you push past, calls drop.
Edge-case branches. The customer who's driving. The customer in a noisy room.
The customer who hangs up after "Hello". The customer who answers in te reo. Each one needs a branch in your script. Demo runs miss them.
The first time you find these is in production. The second time you find them, in the pilot, you fix them before they cost you anything.
What a 60-call pilot looks like
A 60-call pilot isn't a soft launch. It's a structured test.
Pick a controlled list. 60 contacts you've selected for low-stakes recovery if it goes wrong.
For an outbound campaign that means dormant leads from 18 months ago. People who won't be surprised by a call. People who won't complain to your marketing manager if the agent fumbles.
For an inbound deployment that means staff phones first. Your team rings the agent. Tries every weird scenario you can think of.
Run them through. Watch the outcomes. Pull the transcripts.
Score each call on five dimensions.
Did the agent get the intent right? Did the function fire successfully? Did the disposition land in the right system?
Did the customer feel heard? Did the agent recover gracefully when something weird happened?
If 5 of the 60 fail on any one dimension, you've found a real bug. Fix it. Run another 60. Repeat until two consecutive batches of 60 land at 90%+ on all five dimensions.
Now you're ready for 600.
What a 600-call soak looks like
The 60-call pilot tests the script. The 600-call soak tests the load.
Pick a real list. Real customers. Real stakes. But cap the concurrency low enough that the agent can run for two days at a moderate pace.
This is where rate limits show up. Where concurrency caps show up. Where the integrations you wired up six weeks ago break under sustained pressure.
Run the soak. Watch your dashboards.
If the rate limit on your CRM webhook trips, fix it. If the concurrency cap on your outbound trunk drops a call, fix it. If the email provider rejects more than 1% of mid-call sends, fix it.
Then you're ready for 6,000.
(We've shipped this discipline as a feature. Read batch testing voice agents for the regression-test mechanic that catches the same bugs without burning real customers.)
What a 6,000-call deployment looks like
By the time you're at 6,000, the agent is boring. It runs. It dispositions. It hands off to humans on the right edge cases.
Your team's job is to spot-check the worst 5% of transcripts each morning and queue any new edge cases for the next pilot batch.
(The boring part is what makes the dramatic part go right. Most operators want the dramatic part. The ones who win in your space want the boring part.)
Honest moment: when the pilot itself goes wrong
Me: "What's the worst pilot run you've shipped?"
Builder: "Tested with 60 dormant leads. All 60 had been opted out of marketing contact two years prior. Three of them complained. One reported us to the Privacy Commissioner."
Me: "How did you fix it?"
Builder: "Added a DNC list cross-check before the campaign fires. Should have been there from day one."
The pattern: your pilot list needs to pass every compliance check your production list would pass. Don't pilot on contacts you wouldn't ship to.
(Most operators treat the pilot as "well it's only 60 calls, who'll notice". The Privacy Commissioner notices. So does your AR team when one of those 60 was a high-value account who got rung at 7:47am on a Saturday.)
Your pilot-first checklist
Five questions before you pilot:
Then two more:
If you can answer all five, you're ready to pilot.
If you can't yet, you're ready to fail in production. Pick a different week.
Pick the discipline before you pick the scale
Pilot. Soak. Scale.
Same agent at every stage. Same minute pricing. Same 800ms first-token reply. Different blast radius.
Your competitors who ship voice agents that work for years run this discipline. The ones who blow up in week 2 skipped it.
Pick the discipline. Run 60 calls before 600. Run 600 before 6,000.
Then watch your phone reputation hold while your campaign delivers.
Far out, that's a long article. Pick one pilot. Ship one experiment.
Want to run your first 60-call pilot without burning your phone reputation? Spend 25 minutes with me. We'll pick the controlled list, name the five dimensions, and document the rate limits before you fire a single call.
Book a slot here and check our pricing first.
Leonardo Garcia-Curtis
Founder & CEO at Waboom AI. Building voice AI agents that convert.
Ready to Build Your AI Voice Agent?
Let's discuss how Waboom AI can help automate your customer conversations.
Book a Free Demo


