OpenAI Releases Prompting Guide to Control Reasoning Effort in Voice Agents

OpenAI

May 8, 2026

OpenAI published a technical guide for gpt-realtime-2 that introduces granular controls for reasoning effort and spoken preambles. This shift allows developers to tune the balance between voice latency and complex problem-solving for autonomous audio interactions.

OpenAI released a technical guide for gpt-realtime-2, its reasoning voice model for low-latency speech-to-speech applications. It introduces a reasoning.effort parameter (controls internal processing time before responding) with levels from minimal to high. This allows developers to trade speed for deeper logic.

Reasoning effort levels: minimal, low, medium, high
Context window: 128K tokens
Response phases: commentary, final_answer
Preamble length: One to two sentences
Availability: OpenAI API

This update shifts voice AI from simple conversational loops to reasoning-capable agents that can plan multi-step actions. By formalizing preambles—short spoken updates that fill silence during reasoning—OpenAI addresses voice latency. It builds on the gpt-realtime-2 launch to provide engineering patterns for reliable, high-precision audio interfaces.

You can now implement entity capture workflows that use digit-by-digit confirmation for high-precision data like order IDs. The model also supports an expanded 128k token context window (the amount of information a model can process at once), enabling sessions lasting up to two hours. These capabilities are available via the OpenAI API.

View the full update on developers.openai.com

OpenAI Developers

@OpenAIDevsMay 7

Building voice applications with GPT-Realtime-2? Our new prompting guide covers how to tune reasoning effort, use preambles, design tool behavior, handle unclear audio, capture exact entities, and maintain state in longer sessions. https://t.co/9zfdhIX4Vq

58526

View on X

Still wondering? A few quick answers below.

gpt-realtime-2 is a reasoning voice model designed for low-latency speech-to-speech applications. Unlike standard voice models, it can perform internal reasoning before responding, allowing it to follow complex instructions and use tools with higher precision. It is built to handle multi-step tasks and high-precision data capture within a conversational voice interface.

Developers can tune the reasoning effort of gpt-realtime-2 using a specific API parameter with four levels: minimal, low, medium, and high. This setting allows you to balance the model's intelligence against response latency. Minimal effort provides the fastest responses for simple tasks, while high effort enables deeper reasoning for complex troubleshooting or multi-step workflows.

Preambles are short spoken updates, such as "I'll check that for you," that a voice agent says before performing a longer reasoning process or tool call. They are designed to keep the conversation feeling responsive and reassure the user that work is happening, preventing awkward silences that might occur while the model is thinking or accessing data.

To capture high-precision data like order IDs or email addresses, gpt-realtime-2 uses a conservative entity capture workflow. This involves collecting one value at a time, normalizing the input, and reading it back to the user for confirmation. For numeric identifiers, the model is instructed to read values back digit by digit to ensure accuracy before proceeding with tool calls.

The gpt-realtime-2 model features an expanded context window of 128,000 tokens, which is a significant increase from the 32,000 tokens available in earlier realtime models. This larger window allows the model to maintain state and memory over long sessions, supporting approximately one to two hours of dense audio conversation without losing track of the dialogue history.

Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →

See all AI news & updates from OpenAI →

Keep reading

OpenAI Launches GPT-Realtime-2 to Bring GPT-5 Reasoning to Voice Agents

OpenAI released GPT-Realtime-2 alongside new streaming translation and transcription models in its Realtime API. This update shifts voice AI from simple conversational loops to reasoning-capable agents that can solve complex problems and handle interruptions in real time.

GoogleApr 24

Google Releases Prompting Formula for Granular Control of Gemini 3.1 TTS

Google released a formal prompting framework for Gemini 3.1 TTS that uses inline audio tags to control speech style and pacing. This update provides the specific syntax and constraints needed to direct AI voices like human actors, enabling dynamic and expressive vocal performances.

Google Launches Gemini 3.1 Flash Live for Natural Real Time Voice Agents

Google DeepMindMar 28

Google Launches Gemini 3.1 Flash Live for Natural Real Time Voice Agents

Google DeepMind released Gemini 3.1 Flash Live, a low-latency audio model optimized for real-time dialogue and complex task execution. The model improves function calling and tonal recognition, allowing voice agents to handle multi-step workflows and emotional nuances more reliably. This enables more fluid interactions in noisy environments without losing conversational context.

ElevenLabsMar 26

ElevenLabs Launches Guardrails 2.0 for Production Voice Agents

ElevenLabs released Guardrails 2.0 in ElevenAgents, a three-level safety layer that validates user inputs, blocks agent responses in real time, and enforces custom business policies in natural language. For voice agents, a well-crafted system prompt alone isn't enough in production.

What is gpt-realtime-2?

How does reasoning effort work in gpt-realtime-2?

What are preambles in OpenAI voice agents?

How do you capture exact identifiers in gpt-realtime-2?

What is the context window for gpt-realtime-2?

Keep reading

OpenAI Launches GPT-Realtime-2 to Bring GPT-5 Reasoning to Voice Agents

OpenAI Launches GPT-Realtime-2 to Bring GPT-5 Reasoning to Voice Agents

Google Releases Prompting Formula for Granular Control of Gemini 3.1 TTS

Google Releases Prompting Formula for Granular Control of Gemini 3.1 TTS

Google Launches Gemini 3.1 Flash Live for Natural Real Time Voice Agents

Google Launches Gemini 3.1 Flash Live for Natural Real Time Voice Agents

ElevenLabs Launches Guardrails 2.0 for Production Voice Agents

ElevenLabs Launches Guardrails 2.0 for Production Voice Agents

Keep reading

OpenAI Launches GPT-Realtime-2 to Bring GPT-5 Reasoning to Voice Agents

OpenAI Launches GPT-Realtime-2 to Bring GPT-5 Reasoning to Voice Agents

Google Releases Prompting Formula for Granular Control of Gemini 3.1 TTS

Google Releases Prompting Formula for Granular Control of Gemini 3.1 TTS

Google Launches Gemini 3.1 Flash Live for Natural Real Time Voice Agents

Google Launches Gemini 3.1 Flash Live for Natural Real Time Voice Agents

ElevenLabs Launches Guardrails 2.0 for Production Voice Agents

ElevenLabs Launches Guardrails 2.0 for Production Voice Agents