6 min read · Operator notes from the call floor · Last updated 9 May 2026
OpenAI shipped GPT Realtime 2 yesterday.
It is the first voice model OpenAI have released with GPT-5 class reasoning. Four times the context window of the model we ran on Tuesday. Smarter under interruption. Calls a tool mid-conversation without freezing the line on you.
If your voice agent has ever stalled on a CRM lookup mid-pitch, that bit is over.
We have been running GPT Realtime 1.5 across the Waboom AI call floor for months. As of today, your outbound campaigns are running on 2.
Here is what actually changes on your cold call.
In this article
- 1. What did OpenAI ship on 8 May 2026?
- 2. How much smarter is 2 than 1.5?
- 3. Why a 128k context window matters on a sales call
- 4. Async function calling. The line stops freezing.
- 5. Five reasoning levels. We tune per call.
- 6. What this does to a real Waboom AI campaign
- 7. The bottom line for an operator in May 2026
- 8. Frequently Asked Questions
The Release
What did OpenAI ship on 8 May 2026?
Three new voice models in one drop.
GPT Realtime 2 is the flagship for voice agents. GPT Realtime Translate handles live translation across 70 input languages and 13 output languages. GPT Realtime Whisper is the live transcription model.
The flagship is the one that matters for your outbound sales. It is not a 1.5 patch. The benchmarks tell you that.
The Numbers
How much smarter is 2 than 1.5?
96.6% on Big Bench Audio. Up from 81.4%.
A 15.2 point jump on the headline audio reasoning test in one release. Your agent now passes Audio MultiChallenge at 48.5% (up from 34.7%) and ComplexFuncBench at 66.5% (up from 49.7%).
What that maps to on your call: the agent stays on script when prospects throw curveballs. It answers the question that was actually asked, not the closest one in the prompt. It picks the right tool the first time.
Honest caveat for you. Those benchmarks were run at the top two reasoning settings (high and xhigh). Production calls run at low, the default, for latency reasons.
Past 800ms a caller wonders if the line dropped. We covered that cliff in our LLM by job type breakdown. Even at low, GPT Realtime 2 carries a meaningful lift on 1.5.
This is also why the question is no longer "which model" but "which reasoning level for which intent". Same model, different brains for different jobs. We covered that meta shift in why voice agents get smarter every night.
The Context Window
Why a 128k context window matters on a sales call
GPT Realtime 1.5 had a 32,000 token context. GPT Realtime 2 has 128,000.
Four times more room to think.
If your provider was compressing the prospect record before each call, you have been losing context the model could have used.
Concretely on your Sydney vendor lead campaign: full prospect record, last six emails, last three calls, listing history, neighbouring sales, current motivation tags. All loaded in one context. All available to reason against in real time during a 30 second conversation.
Before, we had to compress. Pick the five most load-bearing fields. Hope the agent did not need the rest. Now we hand the model the whole HubSpot card and let it decide what is relevant.
That is the difference between an SDR who skim-read your brief and one who actually knows your lead.
Async Function Calling
Async function calling. The line stops freezing.
On 1.5, your agent said "let me check that for you" and the line went quiet. 700ms. 900ms. 1.4 seconds. Pickup rate killer. Hangup rate inflator.
On 2, your agent narrates while the lookup runs in the background.
"Let me pull that up. Yep, looking at your record now. I see you enquired about the Westmere listing on Tuesday."
Real conversational rhythm. No silence cliff.
This matters most on three jobs we run every day.
A mortgage broker call quoting a live rate from a panel mid-conversation. A real estate call checking a listing CRM while quoting price brackets. A customer support ticket looking up an account while the customer keeps talking.
Same job we have been doing on 1.5. Half the awkwardness on the line.
Reasoning Levels
Five reasoning levels. We tune per call.
GPT Realtime 2 ships with five reasoning levels. Minimal, low, medium, high, xhigh. Low is the default. The benchmarks above were run at high and xhigh, which is the bit most launch coverage glossed over.
For a Waboom AI outbound campaign, the right setting is the lowest one that still holds the script. Higher reasoning costs latency. On an 800ms cliff that compounds.
How we map it across the agent fleet today.
After-hours receptionist taking messages and bookings runs at low. Inbound mortgage quotes with live rate lookups run at medium. Vendor objection handling on a cold seller call runs at high. Multi-step service tickets with conditional logic run at xhigh.
You do not pay for reasoning you do not need. The agent runs at the speed of the job.
The Sydney Playbook on 2
What this does to a real Waboom AI campaign
7.1% of conversations turned into warm transfers on 1.5.
The squeeze in that funnel was always the conversation to transfer step. That is where the agent has to hold three or four objections in a row. That is a reasoning job. And reasoning is exactly what 2 just unlocked.
We are not promising you specific post-upgrade conversion numbers in week one. We have been on the new model for a day. But the bottleneck 1.5 hit on long objection chains is the one 2 breaks.
Same logic for our Christchurch developer campaign. 49 viewings booked. $7.12 per booked viewing. 14 days of Meta lead handling.
The squeeze was always at multi-step recovery when prospects got vague about timing. New context window. Sharper reasoning. Expect your funnel to widen.
The Bottom Line
The bottom line for an operator in May 2026
If your voice agent provider is not on GPT Realtime 2 by end of May, you are calling at a handicap.
The reasoning gap is too big to ignore. The latency profile is the same. Per-minute economics hold. Async function calling fixes the only place the conversational rhythm broke.
Waboom AI has been the LLM-promiscuous voice agency from day one. Right model for the right job, all the way down. GPT Realtime 2 just became the right model for most outbound sales jobs we run.
For the AUD breakdown of what 10,000 calls a month looks like on this stack, our Australian pricing post has the maths.
Frequently Asked Questions
Is GPT Realtime 2 already running on my Waboom AI voice agent?
Most outbound sales campaigns moved across this week. We migrate campaign by campaign as the script gets re-validated at low reasoning. If you are a current customer and want to confirm where your campaign sits, message your Waboom AI account contact.
Does GPT Realtime 2 cost more than 1.5?
Not for you. Your Waboom AI per-minute rate does not change for the move to 2. We absorb the underlying model shift on our side. You get the smarter agent at the same talk-time price you signed for.
What does this mean for accents on AU and NZ campaigns?
Persona work sits above the model. Australian and Kiwi accents are the default for AU and NZ campaigns through voice ID and persona prompts. The reasoning lift in 2 sharpens script handling, not the accent. Full mechanic in the localised persona post.
What about the new translation model?
GPT Realtime Translate covers 70 input languages and 13 output languages live. We are testing it for AU campaigns where the lead pool includes Mandarin or Cantonese first language sellers. Expect a separate post once we have real data on word error rates and pricing for multilingual campaigns.
How fast can a new campaign go live on GPT Realtime 2?
Live in days, not weeks. A focused single-campaign rollout on the new model lands inside a week. Multi-path orchestration with concurrent campaigns and complex CRM integration sits at two to three weeks.
I run my own voice agent stack. What is the cheapest path to 2?
Swap the model ID from gpt-realtime-1.5 to gpt-realtime-2 in your Realtime API call. Re-tune the reasoning level per intent (start at low).
Audit your function-calling prompts so the agent narrates while async tools run. The rest of your stack carries over. Full background on stack choice in our LLM by job type breakdown.
Want a Waboom AI voice agent on GPT Realtime 2 by next week?
Send us a list and a campaign objective. We will spec it on the new model and quote per-outcome before you commit. Same stack behind the Sydney 141-listing campaign, now on GPT-5 class reasoning.
Leonardo Garcia-Curtis
Founder & CEO at Waboom AI. Building voice AI agents that convert.
Ready to Build Your AI Voice Agent?
Let's discuss how Waboom AI can help automate your customer conversations.
Book a Free Demo


