HeadsUpAI

OpenAI Realtime API Gets gpt-realtime-1.5 for Stronger Voice Agents

· Updated

OpenAI released gpt-realtime-1.5, the latest speech-to-speech model in the Realtime API. It accepts text, audio, and image inputs and generates voice responses directly - no separate speech-to-text or text-to-speech pipeline. The model scores higher on reasoning, transcription accuracy for phone numbers and IDs across languages, and following complex developer instructions without drifting over multi-turn conversations.

Voice agents built on earlier realtime models struggled with calling the right tools at the right time and drifting from developer prompts during multi-turn conversations. The 1.5 model addresses both - the demo shows it handling a live food ordering workflow with real-time cart modifications, item swaps, and checkout through natural voice conversation.

Swap the model string to gpt-realtime-1.5 in your Realtime API session configuration - it works across WebRTC, WebSocket, and SIP connections with function calling built in.

OpenAI Developers
OpenAI Developers
@OpenAIDevs
X

Voice workflows just got stronger with gpt-realtime-1.5 in the Realtime API. The model offers more reliable instruction following, tool calling, and multilingual accuracy. Demo with @charlierguo https://t.co/gGV57Wv91V

175retweets
View on X

Share this update