A new networking layer for larger AI clusters
OpenAI has introduced Multipath Reliable Connection, or MRC, a networking protocol designed for large-scale AI training systems where delays between GPUs can slow an entire job. The company said it developed the protocol with AMD, Broadcom, Intel, Microsoft, and NVIDIA, then released the specification through the Open Compute Project so other operators can adopt it.
The move targets one of the less visible bottlenecks in frontier-model development. Training runs depend on huge volumes of data moving among accelerators, and a single late transfer can leave expensive hardware waiting idle. OpenAI argues that as clusters grow, congestion, link failures, and routing problems become frequent enough that network design itself becomes a core determinant of training speed and reliability.
What MRC is meant to fix
In its description of the system, OpenAI said the protocol is built around three ideas: multi-plane high-speed networks for redundancy, adaptive packet spraying to reduce core congestion, and static source routing to work around failures. The company framed these choices as a way to remove complexity while improving resilience.
The basic problem is scale. A modern training step can require millions of data transfers across a supercomputer fabric. If a network path gets congested or a device fails, that disruption can ripple outward and stall synchronized work across many GPUs. OpenAI said MRC is intended to prevent those issues from spreading by distributing traffic more effectively and by allowing failures to be bypassed without relying on more fragile routing behavior.
Three core design choices
- Multi-plane networking is intended to provide redundancy while using fewer components and less power than some alternatives.
- Adaptive packet spraying spreads traffic across paths to reduce hot spots in the network core.
- Static source routing is used in deployment to bypass failures and avoid classes of routing failures altogether.






