OpenAI Realtime API Gets gpt-realtime-1.5 for Stronger Voice Agents

OpenAI

Feb 23, 2026 · Updated Apr 28, 2026

OpenAI released gpt-realtime-1.5 for the Realtime API with stronger instruction following, tool calling, and multilingual transcription. Internal evals show a 5% reasoning lift and 10% better alphanumeric accuracy, directly addressing the reliability gaps that held earlier voice agent deployments back.

OpenAI released gpt-realtime-1.5, the latest speech-to-speech model in the Realtime API. It accepts text, audio, and image inputs and generates voice responses directly - no separate speech-to-text or text-to-speech pipeline. The model scores higher on reasoning, transcription accuracy for phone numbers and IDs across languages, and following complex developer instructions without drifting over multi-turn conversations.

Voice agents built on earlier realtime models struggled with calling the right tools at the right time and drifting from developer prompts during multi-turn conversations. The 1.5 model addresses both - the demo shows it handling a live food ordering workflow with real-time cart modifications, item swaps, and checkout through natural voice conversation.

Swap the model string to gpt-realtime-1.5 in your Realtime API session configuration - it works across WebRTC, WebSocket, and SIP connections with function calling built in.

View the full update on developers.openai.com

OpenAI Developers

@OpenAIDevsFeb 23

Voice workflows just got stronger with gpt-realtime-1.5 in the Realtime API. The model offers more reliable instruction following, tool calling, and multilingual accuracy. Demo with @charlierguo https://t.co/gGV57Wv91V

175

View on X

Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →

See all AI news & updates from OpenAI →

Keep reading

OpenAI Launches GPT-Realtime-2 to Bring GPT-5 Reasoning to Voice Agents

OpenAI released GPT-Realtime-2 alongside new streaming translation and transcription models in its Realtime API. This update shifts voice AI from simple conversational loops to reasoning-capable agents that can solve complex problems and handle interruptions in real time.

Google Launches Gemini 3.1 Flash Live for Natural Real Time Voice Agents

Google DeepMindMar 28

Google Launches Gemini 3.1 Flash Live for Natural Real Time Voice Agents

Google DeepMind released Gemini 3.1 Flash Live, a low-latency audio model optimized for real-time dialogue and complex task execution. The model improves function calling and tonal recognition, allowing voice agents to handle multi-step workflows and emotional nuances more reliably. This enables more fluid interactions in noisy environments without losing conversational context.

Google Gemini 3.1 Flash Live Claims Top Spot for Production Voice Agents

Google AI StudioApr 13

Google Gemini 3.1 Flash Live Claims Top Spot for Production Voice Agents

Google's Gemini 3.1 Flash Live model reached the #1 position on the Tau Voice Bench leaderboard for real-time voice agents. The update delivers significantly lower latency and higher precision, signaling that multimodal voice AI is now reliable enough for production-grade applications.

OpenRouter Launches Unified Audio Endpoints to Simplify Multi-Provider Voice Agents

OpenRouterMay 7

OpenRouter Launches Unified Audio Endpoints to Simplify Multi-Provider Voice Agents

OpenRouter introduced dedicated text-to-speech and transcription endpoints that integrate with its existing unified API and billing system. By aggregating audio models from providers like Google and OpenAI, the update allows developers to build voice agents with automatic fallbacks and centralized observability.