6 min read · Operator notes from the call floor · Last updated 9 May 2026

OpenAI shipped GPT Realtime 2 yesterday.

It is the first voice model OpenAI have released with GPT-5 class reasoning. Four times the context window of the model we ran on Tuesday. Smarter under interruption. Calls a tool mid-conversation without freezing the line on you.

If your voice agent has ever stalled on a CRM lookup mid-pitch, that bit is over.

We have been running GPT Realtime 1.5 across the Waboom AI call floor for months. As of today, your outbound campaigns are running on 2.

Here is what actually changes on your cold call.

In this article

1. What did OpenAI ship on 8 May 2026?
2. How much smarter is 2 than 1.5?
3. Why a 128k context window matters on a sales call
4. Async function calling. The line stops freezing.
5. Five reasoning levels. We tune per call.
6. What this does to a real Waboom AI campaign
7. The bottom line for an operator in May 2026
8. Frequently Asked Questions

The Release

What did OpenAI ship on 8 May 2026?

Three new voice models in one drop.

GPT Realtime 2 is the flagship for voice agents. GPT Realtime Translate handles live translation across 70 input languages and 13 output languages. GPT Realtime Whisper is the live transcription model.

The flagship is the one that matters for your outbound sales. It is not a 1.5 patch. The benchmarks tell you that.

Bar chart showing GPT Realtime 2 vs GPT Realtime 1.5 on Big Bench Audio (96.6% vs 81.4%) and Audio MultiChallenge (48.5% vs 34.7%)

The Numbers

How much smarter is 2 than 1.5?

96.6% on Big Bench Audio. Up from 81.4%.

A 15.2 point jump on the headline audio reasoning test in one release. Your agent now passes Audio MultiChallenge at 48.5% (up from 34.7%) and ComplexFuncBench at 66.5% (up from 49.7%).

What that maps to on your call: the agent stays on script when prospects throw curveballs. It answers the question that was actually asked, not the closest one in the prompt. It picks the right tool the first time.

Honest caveat for you. Those benchmarks were run at the top two reasoning settings (high and xhigh). Production calls run at low, the default, for latency reasons.

Past 800ms a caller wonders if the line dropped. We covered that cliff in our LLM by job type breakdown. Even at low, GPT Realtime 2 carries a meaningful lift on 1.5.

This is also why the question is no longer "which model" but "which reasoning level for which intent". Same model, different brains for different jobs. We covered that meta shift in why voice agents get smarter every night.

The Context Window

Why a 128k context window matters on a sales call

GPT Realtime 1.5 had a 32,000 token context. GPT Realtime 2 has 128,000.

Four times more room to think.

If your provider was compressing the prospect record before each call, you have been losing context the model could have used.

Concretely on your Sydney vendor lead campaign: full prospect record, last six emails, last three calls, listing history, neighbouring sales, current motivation tags. All loaded in one context. All available to reason against in real time during a 30 second conversation.

Before, we had to compress. Pick the five most load-bearing fields. Hope the agent did not need the rest. Now we hand the model the whole HubSpot card and let it decide what is relevant.

That is the difference between an SDR who skim-read your brief and one who actually knows your lead.

Async Function Calling

Async function calling. The line stops freezing.

On 1.5, your agent said "let me check that for you" and the line went quiet. 700ms. 900ms. 1.4 seconds. Pickup rate killer. Hangup rate inflator.

On 2, your agent narrates while the lookup runs in the background.

"Let me pull that up. Yep, looking at your record now. I see you enquired about the Westmere listing on Tuesday."

Real conversational rhythm. No silence cliff.

Real estate agent walking down an Auckland street on a phone call while an AI voice agent dashboard shows a CRM lookup running in parallel and the bubble that reads let me check that for you

This matters most on three jobs we run every day.

A mortgage broker call quoting a live rate from a panel mid-conversation. A real estate call checking a listing CRM while quoting price brackets. A customer support ticket looking up an account while the customer keeps talking.

Same job we have been doing on 1.5. Half the awkwardness on the line.

Reasoning Levels

Five reasoning levels. We tune per call.

GPT Realtime 2 ships with five reasoning levels. Minimal, low, medium, high, xhigh. Low is the default. The benchmarks above were run at high and xhigh, which is the bit most launch coverage glossed over.

Timeline diagram showing five GPT Realtime 2 reasoning levels (minimal, low, medium, high, xhigh) mapped to Waboom voice agent jobs from after-hours receptionist through to multi-step service tickets

For a Waboom AI outbound campaign, the right setting is the lowest one that still holds the script. Higher reasoning costs latency. On an 800ms cliff that compounds.

How we map it across the agent fleet today.

After-hours receptionist taking messages and bookings runs at low. Inbound mortgage quotes with live rate lookups run at medium. Vendor objection handling on a cold seller call runs at high. Multi-step service tickets with conditional logic run at xhigh.

You do not pay for reasoning you do not need. The agent runs at the speed of the job.

The Sydney Playbook on 2

What this does to a real Waboom AI campaign

Our Sydney 90-day vendor lead campaign on the old model: 10,713 dials, 3,609 pickups (33.7%), 1,997 real conversations (18.6%), 141 warm transfers (7.1% of conversations). AU$32.74 per warm-transferred seller.

7.1% of conversations turned into warm transfers on 1.5.

The squeeze in that funnel was always the conversation to transfer step. That is where the agent has to hold three or four objections in a row. That is a reasoning job. And reasoning is exactly what 2 just unlocked.

We are not promising you specific post-upgrade conversion numbers in week one. We have been on the new model for a day. But the bottleneck 1.5 hit on long objection chains is the one 2 breaks.

Same logic for our Christchurch developer campaign. 49 viewings booked. $7.12 per booked viewing. 14 days of Meta lead handling.

The squeeze was always at multi-step recovery when prospects got vague about timing. New context window. Sharper reasoning. Expect your funnel to widen.

The Bottom Line

The bottom line for an operator in May 2026

If your voice agent provider is not on GPT Realtime 2 by end of May, you are calling at a handicap.

The reasoning gap is too big to ignore. The latency profile is the same. Per-minute economics hold. Async function calling fixes the only place the conversational rhythm broke.

Waboom AI has been the LLM-promiscuous voice agency from day one. Right model for the right job, all the way down. GPT Realtime 2 just became the right model for most outbound sales jobs we run.

For the AUD breakdown of what 10,000 calls a month looks like on this stack, our Australian pricing post has the maths.

Frequently Asked Questions

Is GPT Realtime 2 already running on my Waboom AI voice agent?

Most outbound sales campaigns moved across this week. We migrate campaign by campaign as the script gets re-validated at low reasoning. If you are a current customer and want to confirm where your campaign sits, message your Waboom AI account contact.

Does GPT Realtime 2 cost more than 1.5?

Not for you. Your Waboom AI per-minute rate does not change for the move to 2. We absorb the underlying model shift on our side. You get the smarter agent at the same talk-time price you signed for.

What does this mean for accents on AU and NZ campaigns?

Persona work sits above the model. Australian and Kiwi accents are the default for AU and NZ campaigns through voice ID and persona prompts. The reasoning lift in 2 sharpens script handling, not the accent. Full mechanic in the localised persona post.

What about the new translation model?

GPT Realtime Translate covers 70 input languages and 13 output languages live. We are testing it for AU campaigns where the lead pool includes Mandarin or Cantonese first language sellers. Expect a separate post once we have real data on word error rates and pricing for multilingual campaigns.

How fast can a new campaign go live on GPT Realtime 2?

Live in days, not weeks. A focused single-campaign rollout on the new model lands inside a week. Multi-path orchestration with concurrent campaigns and complex CRM integration sits at two to three weeks.

I run my own voice agent stack. What is the cheapest path to 2?

Swap the model ID from gpt-realtime-1.5 to gpt-realtime-2 in your Realtime API call. Re-tune the reasoning level per intent (start at low).

Audit your function-calling prompts so the agent narrates while async tools run. The rest of your stack carries over. Full background on stack choice in our LLM by job type breakdown.

Want a Waboom AI voice agent on GPT Realtime 2 by next week?

Send us a list and a campaign objective. We will spec it on the new model and quote per-outcome before you commit. Same stack behind the Sydney 141-listing campaign, now on GPT-5 class reasoning.

Waboom AI voice agents · Book a 15-min scoping call

Sources: OpenAI: Advancing voice intelligence with new models in the API (8 May 2026) and OpenAI gpt-realtime-2 model documentation.

6 min read · Operator notes from the call floor · Last updated 9 May 2026

OpenAI shipped GPT Realtime 2 yesterday.

If your voice agent has ever stalled on a CRM lookup mid-pitch, that bit is over.

We have been running GPT Realtime 1.5 across the Waboom AI call floor for months. As of today, your outbound campaigns are running on 2.

Here is what actually changes on your cold call.

In this article

1. What did OpenAI ship on 8 May 2026?
2. How much smarter is 2 than 1.5?
3. Why a 128k context window matters on a sales call
4. Async function calling. The line stops freezing.
5. Five reasoning levels. We tune per call.
6. What this does to a real Waboom AI campaign
7. The bottom line for an operator in May 2026
8. Frequently Asked Questions

The Release

What did OpenAI ship on 8 May 2026?

Three new voice models in one drop.

The flagship is the one that matters for your outbound sales. It is not a 1.5 patch. The benchmarks tell you that.

The Numbers

How much smarter is 2 than 1.5?

96.6% on Big Bench Audio. Up from 81.4%.

A 15.2 point jump on the headline audio reasoning test in one release. Your agent now passes Audio MultiChallenge at 48.5% (up from 34.7%) and ComplexFuncBench at 66.5% (up from 49.7%).

Honest caveat for you. Those benchmarks were run at the top two reasoning settings (high and xhigh). Production calls run at low, the default, for latency reasons.

Past 800ms a caller wonders if the line dropped. We covered that cliff in our LLM by job type breakdown. Even at low, GPT Realtime 2 carries a meaningful lift on 1.5.

The Context Window

Why a 128k context window matters on a sales call

GPT Realtime 1.5 had a 32,000 token context. GPT Realtime 2 has 128,000.

Four times more room to think.

If your provider was compressing the prospect record before each call, you have been losing context the model could have used.

Before, we had to compress. Pick the five most load-bearing fields. Hope the agent did not need the rest. Now we hand the model the whole HubSpot card and let it decide what is relevant.

That is the difference between an SDR who skim-read your brief and one who actually knows your lead.

Async Function Calling

Async function calling. The line stops freezing.

On 1.5, your agent said "let me check that for you" and the line went quiet. 700ms. 900ms. 1.4 seconds. Pickup rate killer. Hangup rate inflator.

On 2, your agent narrates while the lookup runs in the background.

"Let me pull that up. Yep, looking at your record now. I see you enquired about the Westmere listing on Tuesday."

Real conversational rhythm. No silence cliff.

This matters most on three jobs we run every day.

Same job we have been doing on 1.5. Half the awkwardness on the line.

Reasoning Levels

Five reasoning levels. We tune per call.

For a Waboom AI outbound campaign, the right setting is the lowest one that still holds the script. Higher reasoning costs latency. On an 800ms cliff that compounds.

How we map it across the agent fleet today.

You do not pay for reasoning you do not need. The agent runs at the speed of the job.

The Sydney Playbook on 2

What this does to a real Waboom AI campaign

7.1% of conversations turned into warm transfers on 1.5.

We are not promising you specific post-upgrade conversion numbers in week one. We have been on the new model for a day. But the bottleneck 1.5 hit on long objection chains is the one 2 breaks.

Same logic for our Christchurch developer campaign. 49 viewings booked. $7.12 per booked viewing. 14 days of Meta lead handling.

The squeeze was always at multi-step recovery when prospects got vague about timing. New context window. Sharper reasoning. Expect your funnel to widen.

The Bottom Line

The bottom line for an operator in May 2026

If your voice agent provider is not on GPT Realtime 2 by end of May, you are calling at a handicap.

The reasoning gap is too big to ignore. The latency profile is the same. Per-minute economics hold. Async function calling fixes the only place the conversational rhythm broke.

Waboom AI has been the LLM-promiscuous voice agency from day one. Right model for the right job, all the way down. GPT Realtime 2 just became the right model for most outbound sales jobs we run.

For the AUD breakdown of what 10,000 calls a month looks like on this stack, our Australian pricing post has the maths.

Frequently Asked Questions

Is GPT Realtime 2 already running on my Waboom AI voice agent?

Does GPT Realtime 2 cost more than 1.5?

Not for you. Your Waboom AI per-minute rate does not change for the move to 2. We absorb the underlying model shift on our side. You get the smarter agent at the same talk-time price you signed for.

What does this mean for accents on AU and NZ campaigns?

What about the new translation model?

How fast can a new campaign go live on GPT Realtime 2?

I run my own voice agent stack. What is the cheapest path to 2?

Swap the model ID from gpt-realtime-1.5 to gpt-realtime-2 in your Realtime API call. Re-tune the reasoning level per intent (start at low).

Audit your function-calling prompts so the agent narrates while async tools run. The rest of your stack carries over. Full background on stack choice in our LLM by job type breakdown.

Want a Waboom AI voice agent on GPT Realtime 2 by next week?

Send us a list and a campaign objective. We will spec it on the new model and quote per-outcome before you commit. Same stack behind the Sydney 141-listing campaign, now on GPT-5 class reasoning.

Waboom AI voice agents · Book a 15-min scoping call

Sources: OpenAI: Advancing voice intelligence with new models in the API (8 May 2026) and OpenAI gpt-realtime-2 model documentation.

GPT Realtime 2 just shipped. Our voice agents are already running it.

What did OpenAI ship on 8 May 2026?

How much smarter is 2 than 1.5?

Why a 128k context window matters on a sales call

Async function calling. The line stops freezing.

Five reasoning levels. We tune per call.

What this does to a real Waboom AI campaign

The bottom line for an operator in May 2026

Frequently Asked Questions

Is GPT Realtime 2 already running on my Waboom AI voice agent?

Does GPT Realtime 2 cost more than 1.5?

What does this mean for accents on AU and NZ campaigns?

What about the new translation model?

How fast can a new campaign go live on GPT Realtime 2?

I run my own voice agent stack. What is the cheapest path to 2?

Leonardo Garcia-Curtis

Ready to Build Your AI Voice Agent?

Related Pages

AI Voice Agents

AI Sales Agent Australia

AI Receptionist for Medical Offices

Related Articles

Which Call Management Solutions Actually Fit Your Business in 2026?

Called, Then Chatted, and Had to Explain Twice? Go Omnichannel.

An AI Voice Agent That Makes Things Up Is a Liability. Here Is How We Stop It.

GPT Realtime 2 just shipped. Our voice agents are already running it.

What did OpenAI ship on 8 May 2026?

How much smarter is 2 than 1.5?

Why a 128k context window matters on a sales call

Async function calling. The line stops freezing.

Five reasoning levels. We tune per call.

What this does to a real Waboom AI campaign

The bottom line for an operator in May 2026

Frequently Asked Questions

Is GPT Realtime 2 already running on my Waboom AI voice agent?

Does GPT Realtime 2 cost more than 1.5?

What does this mean for accents on AU and NZ campaigns?

What about the new translation model?

How fast can a new campaign go live on GPT Realtime 2?

I run my own voice agent stack. What is the cheapest path to 2?

Leonardo Garcia-Curtis

Ready to Build Your AI Voice Agent?

Related Pages

AI Voice Agents

AI Sales Agent Australia

AI Receptionist for Medical Offices

Related Articles

Which Call Management Solutions Actually Fit Your Business in 2026?

Called, Then Chatted, and Had to Explain Twice? Go Omnichannel.

An AI Voice Agent That Makes Things Up Is a Liability. Here Is How We Stop It.