We’ve partnered with @AMD, @Broadcom, @Intel, @Microsoft, and @NVIDIA, to release Multipath Reliable Connection (MRC), a new open networking protocol that helps large AI training clusters run faster and more reliably, with less wasted GPU time. https://t.co/AiV952AJXs
OpenAI Releases MRC Protocol to Stop Network Failures From Stalling GPU Clusters
OpenAIOpenAI released Multipath Reliable Connection (MRC) as an open networking protocol to prevent single link failures from crashing massive AI training jobs. By spraying data across hundreds of paths and using static source routing, the protocol ensures frontier model training remains efficient even as clusters scale past 100,000 GPUs.
- Availability
- Open Compute Project
- Supported Hardware
- NVIDIA GB200, Broadcom, and others
- Network Speed
- 800Gb/s interfaces
- Cluster Scale
- 131,000 GPUs with two switch tiers
- Routing Protocol
- SRv6 Source Routing
Traditional networking acts as a "failure amplifier" in synchronous AI training: if one packet is delayed, thousands of GPUs sit idle. MRC shifts from complex dynamic routing to a deterministic "multi-plane" design that reduces switch tiers. This allows the network to route around failures in microseconds, maintaining momentum for frontier models like GPT-5.5.
You can now access the MRC 1.0 specification through the OCP to optimize large-scale AI infrastructure. While aimed at organizations managing massive GPU clusters, adoption across major hardware vendors ensures future AI-native networking will be more resilient. This release follows other infrastructure optimizations like OpenAI's WebSocket-based Responses API.
Still wondering? A few quick answers below.
Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →




