🎙️ Voice AI only feels natural when conversation keeps pace with speech. Here’s how we rebuilt our WebRTC stack with a thin relay and stateful transceiver to keep real-time media fast for ChatGPT voice, the Realtime API, and more. https://t.co/JEvs2PmsmC
OpenAI Rebuilds WebRTC Stack to Scale Low Latency Voice AI
OpenAIOpenAI rearchitected its media infrastructure with a split relay and transceiver model to support 900 million weekly users on Kubernetes. By routing packets based on protocol metadata rather than dedicated ports, the system maintains sub-second latency for real-time voice interactions at global scale.
Standard WebRTC requires thousands of open UDP ports, creating security risks. By using a "Global Relay" network, OpenAI shortens the distance between users and servers to minimize jitter. This infrastructure shift mirrors OpenAI's WebSocket-based Responses API and OpenAI's open-source voice component where delays break conversational flow.
The architecture ensures audio arrives as a continuous stream, allowing models to reason while a user is still talking. This design is optimized for 1:1 interactions typical of AI agents. You can access these low-latency capabilities through the Realtime API or the ChatGPT mobile and web applications.
Still wondering? A few quick answers below.
Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →




