OpenAI Launches GPT-Realtime-2 to Bring GPT-5 Reasoning to Voice Agents

OpenAI

May 7, 2026 · Updated May 16, 2026

OpenAI released GPT-Realtime-2 alongside new streaming translation and transcription models in its Realtime API. This update shifts voice AI from simple conversational loops to reasoning-capable agents that can solve complex problems and handle interruptions in real time.

OpenAI released GPT-Realtime-2, a new audio-native model for its Realtime API that adds reasoning to voice interactions. The suite includes GPT-Realtime-Translate, supporting streaming translation across 70 languages, and GPT-Realtime-Whisper for live transcription. These models enable bidirectional, low-latency audio streams for production-ready voice agents.

GPT-Realtime-2 capability: GPT-5-class reasoning
Translation support: 70+ input and 13 output languages
Transcription model: GPT-Realtime-Whisper
API parameter: reasoning.effort
Availability: Realtime API

This release bridges the gap between the GPT-5.5 reasoning models and OpenAI's WebRTC infrastructure updates. By moving reasoning into the audio modality, voice agents can now handle natural conversational interruptions and solve multi-step problems as they unfold. This shifts the paradigm from reactive chatbots to collaborative agents capable of complex real-time logic.

You can access these models through the Realtime API to build voice-first applications like live interpreters or technical support agents. Developers can use the reasoning.effort parameter to balance intelligence with latency requirements. While these capabilities are live in the API, OpenAI noted that they are not yet available within the consumer ChatGPT application.

View the full update on openai.com

OpenAI

@OpenAIMay 7

Introducing GPT-Realtime-2 in the API: our most intelligent voice model yet, bringing GPT-5-class reasoning to voice agents. Voice agents are now real-time collaborators that can listen, reason, and solve complex problems as conversations unfold. Now available in the API alongside streaming models GPT-Realtime-Translate and GPT-Realtime-Whisper — a new set of audio capabilities for the next generation of voice interfaces.

1.4k15k

View on X

Still wondering? A few quick answers below.

GPT-Realtime-2 is an audio-native model from OpenAI designed for low-latency voice interactions. It brings GPT-5-class reasoning to voice agents, allowing them to solve complex problems and think through tasks as a conversation happens. Unlike standard voice models, it is built to handle natural interruptions and act as a real-time collaborator during live dialogue.

No, GPT-Realtime-2 and the associated audio models are currently only available through the OpenAI API for developers. While users may be eager for these voice updates to reach the consumer ChatGPT application, OpenAI has stated that these specific capabilities are not yet live in the app and are currently focused on production-ready API integrations.

GPT-Realtime-Translate is a streaming model designed to break down language barriers during live conversations. It supports more than 70 input languages and 13 output languages. This allows developers to build voice agents that can translate audio in real time, helping people communicate more naturally across different regions and dialects without significant processing delays.

Developers can manage the intelligence and speed of GPT-Realtime-2 using a specific API parameter called reasoning.effort. By setting this value, developers can choose between lower latency for faster responses or higher reasoning effort for more complex problem-solving. This allows for fine-tuning voice agents based on whether a task requires quick feedback or deeper thinking.

GPT-Realtime-Whisper is a streaming transcription model released as part of OpenAI's new audio capability set. It is designed to transcribe audio into text as words are being spoken, making it ideal for generating live captions or meeting notes. It works alongside other real-time models to provide a comprehensive suite for building advanced, multimodal voice interfaces.

Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →

See all AI news & updates from OpenAI →

Keep reading

OpenAI Realtime API Gets gpt-realtime-1.5 for Stronger Voice Agents

OpenAI released gpt-realtime-1.5 for the Realtime API with stronger instruction following, tool calling, and multilingual transcription. Internal evals show a 5% reasoning lift and 10% better alphanumeric accuracy, directly addressing the reliability gaps that held earlier voice agent deployments back.

Google Launches Gemini 3.1 Flash Live for Natural Real Time Voice Agents

Google DeepMindMar 28

Google Launches Gemini 3.1 Flash Live for Natural Real Time Voice Agents

Google DeepMind released Gemini 3.1 Flash Live, a low-latency audio model optimized for real-time dialogue and complex task execution. The model improves function calling and tonal recognition, allowing voice agents to handle multi-step workflows and emotional nuances more reliably. This enables more fluid interactions in noisy environments without losing conversational context.

OpenRouter Launches Unified Audio Endpoints to Simplify Multi-Provider Voice Agents

OpenRouterMay 7

OpenRouter Launches Unified Audio Endpoints to Simplify Multi-Provider Voice Agents

OpenRouter introduced dedicated text-to-speech and transcription endpoints that integrate with its existing unified API and billing system. By aggregating audio models from providers like Google and OpenAI, the update allows developers to build voice agents with automatic fallbacks and centralized observability.

Google Gemini 3.1 Flash Live Claims Top Spot for Production Voice Agents

Google AI StudioApr 13

Google Gemini 3.1 Flash Live Claims Top Spot for Production Voice Agents

Google's Gemini 3.1 Flash Live model reached the #1 position on the Tau Voice Bench leaderboard for real-time voice agents. The update delivers significantly lower latency and higher precision, signaling that multimodal voice AI is now reliable enough for production-grade applications.

What is GPT-Realtime-2?

Is GPT-Realtime-2 available in the ChatGPT app?

What languages does GPT-Realtime-Translate support?

How do developers control the reasoning performance of GPT-Realtime-2?

What is the purpose of GPT-Realtime-Whisper?

Keep reading

OpenAI Realtime API Gets gpt-realtime-1.5 for Stronger Voice Agents

OpenAI Realtime API Gets gpt-realtime-1.5 for Stronger Voice Agents

Google Launches Gemini 3.1 Flash Live for Natural Real Time Voice Agents

Google Launches Gemini 3.1 Flash Live for Natural Real Time Voice Agents

OpenRouter Launches Unified Audio Endpoints to Simplify Multi-Provider Voice Agents

OpenRouter Launches Unified Audio Endpoints to Simplify Multi-Provider Voice Agents

Google Gemini 3.1 Flash Live Claims Top Spot for Production Voice Agents

Google Gemini 3.1 Flash Live Claims Top Spot for Production Voice Agents

Keep reading

OpenAI Realtime API Gets gpt-realtime-1.5 for Stronger Voice Agents

OpenAI Realtime API Gets gpt-realtime-1.5 for Stronger Voice Agents

Google Launches Gemini 3.1 Flash Live for Natural Real Time Voice Agents

Google Launches Gemini 3.1 Flash Live for Natural Real Time Voice Agents

OpenRouter Launches Unified Audio Endpoints to Simplify Multi-Provider Voice Agents

OpenRouter Launches Unified Audio Endpoints to Simplify Multi-Provider Voice Agents

Google Gemini 3.1 Flash Live Claims Top Spot for Production Voice Agents

Google Gemini 3.1 Flash Live Claims Top Spot for Production Voice Agents