HeadsUpAI

OpenAI Launches GPT-Realtime-2 to Bring GPT-5 Reasoning to Voice Agents

· Updated

OpenAI released GPT-Realtime-2, a new audio-native model for its Realtime API that adds reasoning to voice interactions. The suite includes GPT-Realtime-Translate, supporting streaming translation across 70 languages, and GPT-Realtime-Whisper for live transcription. These models enable bidirectional, low-latency audio streams for production-ready voice agents.
GPT-Realtime-2 capability
GPT-5-class reasoning
Translation support
70+ input and 13 output languages
Transcription model
GPT-Realtime-Whisper
API parameter
reasoning.effort
Availability
Realtime API

This release bridges the gap between the GPT-5.5 reasoning models and OpenAI's WebRTC infrastructure updates. By moving reasoning into the audio modality, voice agents can now handle natural conversational interruptions and solve multi-step problems as they unfold. This shifts the paradigm from reactive chatbots to collaborative agents capable of complex real-time logic.

You can access these models through the Realtime API to build voice-first applications like live interpreters or technical support agents. Developers can use the reasoning.effort parameter to balance intelligence with latency requirements. While these capabilities are live in the API, OpenAI noted that they are not yet available within the consumer ChatGPT application.

OpenAI
OpenAI
@OpenAI
X

Introducing GPT-Realtime-2 in the API: our most intelligent voice model yet, bringing GPT-5-class reasoning to voice agents. Voice agents are now real-time collaborators that can listen, reason, and solve complex problems as conversations unfold. Now available in the API alongside streaming models GPT-Realtime-Translate and GPT-Realtime-Whisper — a new set of audio capabilities for the next generation of voice interfaces.

1.4kretweets15klikes
View on X

Still wondering? A few quick answers below.

GPT-Realtime-2 is an audio-native model from OpenAI designed for low-latency voice interactions. It brings GPT-5-class reasoning to voice agents, allowing them to solve complex problems and think through tasks as a conversation happens. Unlike standard voice models, it is built to handle natural interruptions and act as a real-time collaborator during live dialogue.

No, GPT-Realtime-2 and the associated audio models are currently only available through the OpenAI API for developers. While users may be eager for these voice updates to reach the consumer ChatGPT application, OpenAI has stated that these specific capabilities are not yet live in the app and are currently focused on production-ready API integrations.

GPT-Realtime-Translate is a streaming model designed to break down language barriers during live conversations. It supports more than 70 input languages and 13 output languages. This allows developers to build voice agents that can translate audio in real time, helping people communicate more naturally across different regions and dialects without significant processing delays.

Developers can manage the intelligence and speed of GPT-Realtime-2 using a specific API parameter called reasoning.effort. By setting this value, developers can choose between lower latency for faster responses or higher reasoning effort for more complex problem-solving. This allows for fine-tuning voice agents based on whether a task requires quick feedback or deeper thinking.

GPT-Realtime-Whisper is a streaming transcription model released as part of OpenAI's new audio capability set. It is designed to transcribe audio into text as words are being spoken, making it ideal for generating live captions or meeting notes. It works alongside other real-time models to provide a comprehensive suite for building advanced, multimodal voice interfaces.

Share this update