Voice workflows just got stronger with gpt-realtime-1.5 in the Realtime API. The model offers more reliable instruction following, tool calling, and multilingual accuracy. Demo with @charlierguo https://t.co/gGV57Wv91V
OpenAI Realtime API Gets gpt-realtime-1.5 for Stronger Voice Agents
OpenAI· Updated
OpenAI released gpt-realtime-1.5 for the Realtime API with stronger instruction following, tool calling, and multilingual transcription. Internal evals show a 5% reasoning lift and 10% better alphanumeric accuracy, directly addressing the reliability gaps that held earlier voice agent deployments back.
gpt-realtime-1.5, the latest speech-to-speech model in the Realtime API. It accepts text, audio, and image inputs and generates voice responses directly - no separate speech-to-text or text-to-speech pipeline. The model scores higher on reasoning, transcription accuracy for phone numbers and IDs across languages, and following complex developer instructions without drifting over multi-turn conversations.Voice agents built on earlier realtime models struggled with calling the right tools at the right time and drifting from developer prompts during multi-turn conversations. The 1.5 model addresses both - the demo shows it handling a live food ordering workflow with real-time cart modifications, item swaps, and checkout through natural voice conversation.
Swap the model string to gpt-realtime-1.5 in your Realtime API session configuration - it works across WebRTC, WebSocket, and SIP connections with function calling built in.
Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →



