OpenAI Rebuilds WebRTC Stack to Scale Low Latency Voice AI

OpenAI

May 5, 2026

OpenAI rearchitected its media infrastructure with a split relay and transceiver model to support 900 million weekly users on Kubernetes. By routing packets based on protocol metadata rather than dedicated ports, the system maintains sub-second latency for real-time voice interactions at global scale.

OpenAI rearchitected its WebRTC (a protocol for real-time audio and video) stack to power ChatGPT voice and the Realtime API. The design splits the workload between a stateless relay and a stateful transceiver. This replaces the one-port-per-session model, which fits poorly within Kubernetes (a system for managing containerized software) environments.

Standard WebRTC requires thousands of open UDP ports, creating security risks. By using a "Global Relay" network, OpenAI shortens the distance between users and servers to minimize jitter. This infrastructure shift mirrors OpenAI's WebSocket-based Responses API and OpenAI's open-source voice component where delays break conversational flow.

The architecture ensures audio arrives as a continuous stream, allowing models to reason while a user is still talking. This design is optimized for 1:1 interactions typical of AI agents. You can access these low-latency capabilities through the Realtime API or the ChatGPT mobile and web applications.

View the full update on openai.com

OpenAI Developers

@OpenAIDevsMay 5

🎙️ Voice AI only feels natural when conversation keeps pace with speech. Here’s how we rebuilt our WebRTC stack with a thin relay and stateful transceiver to keep real-time media fast for ChatGPT voice, the Realtime API, and more. https://t.co/JEvs2PmsmC

74840

View on X

Still wondering? A few quick answers below.

OpenAI rearchitected its stack to handle massive scale while maintaining low latency for 900 million weekly users. The previous model required thousands of open ports per session, which was difficult to secure and scale on Kubernetes. The new design uses a split relay and transceiver model to provide a smaller, fixed network footprint.

The relay acts as a lightweight forwarding layer that routes packets without decrypting them. It uses the ICE username fragment, a protocol-native identifier, to steer traffic to the correct destination. The transceiver remains the stateful endpoint that handles encryption, session lifecycle, and media processing, allowing backend services to scale like standard web applications.

Global Relay is a geographically distributed network of ingress points designed to shorten the distance between a user and OpenAI's infrastructure. By accepting packets at a relay close to the user, the system reduces network jitter and round-trip time. This ensures that conversational turn-taking in ChatGPT voice and the Realtime API feels natural and responsive.

OpenAI avoids the traditional one-port-per-session model, which is brittle for autoscaling and difficult to load balance. Instead, they use a small number of stable UDP ports and a custom routing layer. This allows pods to be added or rescheduled without breaking active sessions, as the relay can deterministically recover routes using metadata already present in the connection setup.

This rearchitected media stack currently powers real-time voice interactions for ChatGPT and the Realtime API. It is also used for internal research projects and interactive AI agents that require continuous audio streaming. The design allows these systems to process audio, perform transcription, and generate speech while a user is still talking, rather than waiting for a full upload.

Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →

See all AI news & updates from OpenAI →

Keep reading

OpenAI Realtime API Gets gpt-realtime-1.5 for Stronger Voice Agents

OpenAI released gpt-realtime-1.5 for the Realtime API with stronger instruction following, tool calling, and multilingual transcription. Internal evals show a 5% reasoning lift and 10% better alphanumeric accuracy, directly addressing the reliability gaps that held earlier voice agent deployments back.

OpenRouter Launches Unified Audio Endpoints to Simplify Multi-Provider Voice Agents

OpenRouterMay 7

OpenRouter Launches Unified Audio Endpoints to Simplify Multi-Provider Voice Agents

OpenRouter introduced dedicated text-to-speech and transcription endpoints that integrate with its existing unified API and billing system. By aggregating audio models from providers like Google and OpenAI, the update allows developers to build voice agents with automatic fallbacks and centralized observability.

Cloudflare Launches Experimental Voice Pipeline for Real-Time Agent Interactions

CloudflareApr 16

Cloudflare Launches Experimental Voice Pipeline for Real-Time Agent Interactions

Cloudflare introduced the @cloudflare/voice package, an experimental extension for its Agents SDK that enables bidirectional voice communication over WebSockets. By unifying voice and text state within a single Durable Object, developers can build multimodal agents that maintain context across different interaction channels.

Google Launches Gemini 3.1 Flash Live for Natural Real Time Voice Agents

Google DeepMindMar 28

Google Launches Gemini 3.1 Flash Live for Natural Real Time Voice Agents

Google DeepMind released Gemini 3.1 Flash Live, a low-latency audio model optimized for real-time dialogue and complex task execution. The model improves function calling and tonal recognition, allowing voice agents to handle multi-step workflows and emotional nuances more reliably. This enables more fluid interactions in noisy environments without losing conversational context.

Why did OpenAI rebuild its WebRTC stack for voice AI?

How does the OpenAI relay and transceiver architecture work?

What is OpenAI Global Relay?

How does OpenAI manage WebRTC sessions on Kubernetes?

Which OpenAI products use this new WebRTC infrastructure?

Keep reading

OpenAI Realtime API Gets gpt-realtime-1.5 for Stronger Voice Agents

OpenAI Realtime API Gets gpt-realtime-1.5 for Stronger Voice Agents

OpenRouter Launches Unified Audio Endpoints to Simplify Multi-Provider Voice Agents

OpenRouter Launches Unified Audio Endpoints to Simplify Multi-Provider Voice Agents

Cloudflare Launches Experimental Voice Pipeline for Real-Time Agent Interactions

Cloudflare Launches Experimental Voice Pipeline for Real-Time Agent Interactions

Google Launches Gemini 3.1 Flash Live for Natural Real Time Voice Agents

Google Launches Gemini 3.1 Flash Live for Natural Real Time Voice Agents

Keep reading

OpenAI Realtime API Gets gpt-realtime-1.5 for Stronger Voice Agents

OpenAI Realtime API Gets gpt-realtime-1.5 for Stronger Voice Agents

OpenRouter Launches Unified Audio Endpoints to Simplify Multi-Provider Voice Agents

OpenRouter Launches Unified Audio Endpoints to Simplify Multi-Provider Voice Agents

Cloudflare Launches Experimental Voice Pipeline for Real-Time Agent Interactions

Cloudflare Launches Experimental Voice Pipeline for Real-Time Agent Interactions

Google Launches Gemini 3.1 Flash Live for Natural Real Time Voice Agents

Google Launches Gemini 3.1 Flash Live for Natural Real Time Voice Agents