Waboom AI
AI Training
AI Automation
AI Voice Agents
Resources
Contact
09 888 0402
Back to BlogOptimization

Voice AI Agents Still Need Humans

Leonardo Garcia-Curtis06/08/2025
Voice AI Agents Still Need Humans

We deployed a voice agent for a logistics company in Palmerston North. 200+ calls a day, booking courier pickups. Launch day went perfectly. Week one, no issues.

Week three, their pickup completion rate dropped from 78% to 61%. Nobody complained. The dashboard looked fine.

Average call duration, normal. Transfer rate, normal.

The problem? A prompt change we'd made to handle a new service zone introduced a subtle redirect. Callers asking about rural deliveries got looped back to the main menu.

They weren't hanging up angry. They were just quietly giving up.

We only caught it because a human reviewed the transcripts. No automated metric flagged it. Your agent won't tell you it's broken. You have to watch.

Why "Set and Forget" Fails

You've heard this pitch: "Deploy your AI agent and let it run." It's tempting. And it works — until it doesn't.

Voice agents degrade in ways no dashboard catches. A new competitor name confuses your intent detection. A seasonal product change makes your knowledge base answers wrong. A caller accent your agent handled fine in testing fails at scale.

Scale doesn't forgive sloppiness. It amplifies it. A small issue repeated across 500 calls a day becomes your brand's reputation problem by Friday.

Agent monitoring

Smarter every week — if humans watch.

What Your Dashboard Shows (and What It Misses)

Retell's analytics give you the basics. These matter. Track them from day one:

  • Call duration trends — Spikes signal confusion. Drops signal abandoned conversations.
  • Transfer rates — Rising transfers mean your agent is hitting limits it shouldn't.
  • Sentiment analysis — Caller frustration patterns at key moments in your flow.
  • First-call resolution — Did your caller get what they needed without calling back?
  • But dashboards show you averages. And averages lie.

    Your P50 call duration looks great at 2 minutes. Your P90 is 8 minutes. 10% of your callers are trapped in loops, repeating themselves, getting nowhere. The dashboard says "all clear." The callers say something very different.

    The Metrics That Actually Predict Failure

    After managing 50+ live voice agents, here are the signals we watch that most teams ignore:

    Repeat callers within 24 hours. If the same number calls back within a day, your agent failed on the first attempt. Track this. We've seen agents with 85% resolution rates that were actually 70% — because 15% of callers just tried again.

    Conversation depth vs outcome. A caller who reaches turn 12 of your conversation flow and then transfers to a human didn't have a good experience. Long conversations with negative outcomes are your worst-case scenario.

    Silence gaps exceeding 3 seconds. Your caller said something your agent didn't understand. It paused. Your caller repeated themselves.

    These gaps kill trust — and they don't show up in latency metrics because your LLM responded. It just responded with confusion.

    Knowledge base miss rate. How often does your agent retrieve irrelevant chunks from your knowledge base? A 20% miss rate means 1 in 5 answers is wrong. Your callers notice before you do.

    The Weekly Optimisation Loop

    Every agent we manage at Waboom AI goes through a weekly cycle:

    Monday: Data Review. Pull the week's numbers. Flag anomalies — duration spikes, transfer rate changes, sentiment drops, repeat callers. Compare against your baseline from the previous 4 weeks.

    Tuesday-Wednesday: Transcript Review. Read the worst 10% of calls. Not the summaries — the actual transcripts. Find the exact turn where your conversation broke.

    Was it a prompt issue? A knowledge base gap? A missing conversation path?

    Thursday: Fix and Test. Make surgical changes. Not rewrites. Adjust the specific node, prompt, or condition that caused the failure. Run batch simulation tests to verify your fix doesn't break other paths.

    Friday: Deploy and Report. Push the changes to production. Send your client a report showing what changed, why, and the expected impact.

    This loop runs every week. Not monthly. Not quarterly. Weekly.

    Your agent improves 15% per quarter when you maintain this rhythm.

    Surgical Optimisation: The Examples

    Here's what a typical fix looks like. Your agent's greeting was:

    Before: "Thank you for calling. I understand you may have a question about your account. I'd be happy to help you with that today. Could you please tell me what you're calling about so I can direct you to the right information?"

    After: "Hi, this is the account team. What can I help with?"

    Same intent. 80% fewer tokens. Your caller gets to the point 4 seconds faster.

    Response time drops because your prompt is shorter. Completion rate goes up because your caller isn't bored before they've asked their question.

    Another example. Your agent kept transferring callers who asked about refunds — even when the answer was in your knowledge base.

    The issue? The transfer trigger was too broad. "Caller mentions money" caught refund queries alongside legitimate payment questions.

    Fix: narrow the transfer condition to "caller requests a refund AND the refund amount exceeds 500." Everything under that threshold, your agent handles.

    Transfer rate dropped 23% in one week.

    When to Escalate to Humans

    Your agent should handle the routine. Humans should handle the exceptions. Here's where we draw the line:

    Your agent handles:

  • Standard enquiries with clear answers
  • Booking and scheduling
  • Account lookups and status checks
  • Your humans handle:

  • Complaints that require empathy
  • Complex disputes spanning multiple interactions
  • High-value decisions your business isn't comfortable automating
  • The key is your transfer with full context. When your agent hands off to a human, that human should know everything the caller already said.

    No repetition. No "can you start from the beginning?"

    The Cost of Not Watching

    We track the impact across our client base. The numbers tell you everything:

    Your agents with weekly optimisation:

  • 15% quarterly improvement in resolution rate
  • 87% client satisfaction score
  • Issues caught within 24 hours
  • Your agents left unmanaged:

  • 8% quarterly degradation in resolution rate
  • Compliance drift that goes unnoticed for weeks
  • Issues caught when customers complain on social media
  • Your agent doesn't get better on its own. It drifts. Slowly, invisibly, until something breaks publicly.

    Your AI agent needs a human. That's us.

    Book a Strategy Call | See the Platform

    Frequently Asked Questions

    How often should voice agents be reviewed and optimised?

    Weekly. We run a Monday-to-Friday cycle: data review, transcript analysis, targeted fixes, batch testing, and deployment. Agents that get weekly attention improve 15% per quarter.

    Agents left unmanaged degrade at roughly 8% per quarter. The difference compounds fast.

    What metrics matter most for voice agent performance?

    Beyond the basics (call duration, transfer rate, sentiment), track repeat callers within 24 hours and silence gaps over 3 seconds. Monitor conversation depth vs outcome and knowledge base miss rate too.

    These secondary metrics predict failures before your standard dashboard catches them.

    Can automated monitoring replace human review?

    No. Automated alerts catch the obvious — duration spikes, transfer rate jumps, sentiment drops.

    But subtle failures like conversational loops, incorrect knowledge base answers, and prompt drift require a human reading actual transcripts. We use automation for detection and humans for diagnosis.

    What does a typical optimisation fix look like?

    Most fixes are surgical, not wholesale. Shortening a greeting from 40 words to 10. Narrowing a transfer trigger that's too broad. Adding a missing path for a common edge case.

    Each fix is tested with batch simulation before deployment. The goal is small, targeted changes every week — not big rewrites every quarter.

    LG

    Leonardo Garcia-Curtis

    Founder & CEO at Waboom AI. Building voice AI agents that convert.

    Ready to Build Your AI Voice Agent?

    Let's discuss how Waboom AI can help automate your customer conversations.

    Book a Free Demo
    Waboom AI

    Empowering New Zealand and Australian businesses with AI voice agents and automation that deliver real, measurable value.

    hello@waboom.ai+64 9 888 0402
    Level 8, 139 Quay Street
    Auckland CBD, New Zealand

    Solutions

    • AI Training
    • AI Strategy
    • AI Automation
    • AI Voice Agents
    • AI Champion Workshop

    Resources

    • AI Voice Agent Pricing
    • AI Voice Demos
    • Resources
    • Blog

    Company

    • About Us
    • Contact
    • Privacy Policy
    • Terms of Service

    Powered by leading AI technologies

    VAPIRetell AIOpenAIZapierMakeStripe

    © 2026 Waboom.ai. All rights reserved.

    PrivacyTermsSecurity