InfiniBand XDR (eXtreme Data Rate)

TL;DR

InfiniBand XDR is the IBTA-defined 800 Gb/s per-port generation, doubling NDR's bandwidth by running 4 lanes of 200 Gb/s PAM4 signalling per port.
Pairs with the NVIDIA Quantum-3 switch and ConnectX-8 host channel adapter, the reference fabric for B200, GB200, and GB300 NVL training clusters.
Retains InfiniBand's credit-based lossless flow control, hardware RDMA, sub-microsecond cut-through switching, and SHARPv4 in-network reduction.
Volume shipments began in 2024-2025; XDR is positioned against 800G Spectrum-X Ethernet for frontier-model training fabrics through 2027.

Overview#

InfiniBand XDR is the sixth generation of the InfiniBand standard, succeeding NDR. Where NDR uses four lanes of 100 Gb/s PAM4 to reach 400 Gb/s per port, XDR doubles the per-lane signalling rate to 200 Gb/s PAM4, keeping the same four-lane port format and arriving at 800 Gb/s per port. The aggregated XDR2 link uses eight lanes for 1.6 Tb/s on a single cable.

XDR was timed to match the Blackwell GPU generation. A single B200 GPU's HBM3e and NVLink 5 bandwidth would saturate an NDR port during cross-pod collectives, so the fabric had to scale alongside the accelerators. NVIDIA's Quantum-3 platform — the XDR switch family — and the ConnectX-8 HCA together form the reference fabric for the GB200 NVL72 and GB300 NVL72 reference designs.

Specifications#

Property	Value
Per-lane signalling	200 Gb/s PAM4
Lanes per standard port	4
Per-port line rate	800 Gb/s (XDR)
Aggregated link (XDR2)	1.6 Tb/s (8 lanes)
Encoding	PAM4 with RS-FEC
Switch ASIC	NVIDIA Quantum-3
Host adapter	ConnectX-8
Connector	OSFP-XD twin-port
Switch port-to-port latency	Cut-through, sub-microsecond
First volume shipments	2024-2025
In-network reduction	SHARPv4

Why XDR Mattered for Blackwell#

The argument for XDR is bandwidth balance. A GB200 NVL72 rack ties 72 Blackwell GPUs into a single NVLink domain at ~130 TB/s of bisection bandwidth. When two such racks need to exchange gradients during AllReduce, the inter-rack fabric is the bottleneck — and an NDR port is not enough to keep tensor parallelism stages from stalling at scale.

XDR's 800 Gb/s per port aligns the inter-rack fabric capacity to the rate at which NVLink can drain a Blackwell's HBM. In practice this lets a 4,096-GPU pod hit higher sustained MFU on long-context training than the equivalent NDR fabric, particularly when expert-parallel mixture-of-experts models thrash the all-to-all collective.

Operational Notes#

Cabling moves from OSFP to OSFP-XD; existing NDR DACs and AOCs do not slot into XDR cages without adapters.
Passive copper reach shrinks at higher signalling rates — most XDR runs beyond ~2 m use linear pluggable optics (LPO) or active optical cables.
Quantum-3 supports SHARPv4 with deeper reduction trees and lower precision (BF16/FP8) — enable explicitly in NCCL for AllReduce-bound jobs.
Subnet Manager scaling: UFM 6.x is recommended for clusters above 1,024 nodes due to faster topology re-convergence.
Mixed NDR/XDR subnets are supported but cap the slower side; segregate pods if budget allows.

XDR optics dissipate noticeably more heat than NDR equivalents. Rack-level airflow and switch fan curves should be re-validated when transitioning a hall from NDR to XDR.

Pitfalls#

Driver and firmware drift between ConnectX-7 (NDR) and ConnectX-8 (XDR) is common during phased migrations; standardise DOCA release per cluster.
Some early XDR cables exhibited tighter bend-radius tolerances than NDR — survey rack cabling plans before installation.
Power draw per port roughly doubles versus NDR; PDU budgeting at the top-of-rack switch tier must be revisited.

References

InfiniBand Architecture Specification · InfiniBand Trade Association
NVIDIA Quantum-3 InfiniBand Platform · NVIDIA
NVIDIA ConnectX-8 SuperNIC Product Brief · NVIDIA

Overview#

Specifications#

Property	Value
Per-lane signalling	200 Gb/s PAM4
Lanes per standard port	4
Per-port line rate	800 Gb/s (XDR)
Aggregated link (XDR2)	1.6 Tb/s (8 lanes)
Encoding	PAM4 with RS-FEC
Switch ASIC	NVIDIA Quantum-3
Host adapter	ConnectX-8
Connector	OSFP-XD twin-port
Switch port-to-port latency	Cut-through, sub-microsecond
First volume shipments	2024-2025
In-network reduction	SHARPv4

Why XDR Mattered for Blackwell#

Operational Notes#

Cabling moves from OSFP to OSFP-XD; existing NDR DACs and AOCs do not slot into XDR cages without adapters.

Passive copper reach shrinks at higher signalling rates — most XDR runs beyond ~2 m use linear pluggable optics (LPO) or active optical cables.

Quantum-3 supports SHARPv4 with deeper reduction trees and lower precision (BF16/FP8) — enable explicitly in NCCL for AllReduce-bound jobs.

Subnet Manager scaling: UFM 6.x is recommended for clusters above 1,024 nodes due to faster topology re-convergence.

Mixed NDR/XDR subnets are supported but cap the slower side; segregate pods if budget allows.

XDR optics dissipate noticeably more heat than NDR equivalents. Rack-level airflow and switch fan curves should be re-validated when transitioning a hall from NDR to XDR.

Pitfalls#

Driver and firmware drift between ConnectX-7 (NDR) and ConnectX-8 (XDR) is common during phased migrations; standardise DOCA release per cluster.

Some early XDR cables exhibited tighter bend-radius tolerances than NDR — survey rack cabling plans before installation.

Power draw per port roughly doubles versus NDR; PDU budgeting at the top-of-rack switch tier must be revisited.

InfiniBand XDR (eXtreme Data Rate)

Overview#

Specifications#

Why XDR Mattered for Blackwell#

Operational Notes#

Pitfalls#

References

Browse all entries

Deploy on Yobitel

InfiniBand XDR (eXtreme Data Rate)

Overview#

Specifications#

Why XDR Mattered for Blackwell#

Operational Notes#

Pitfalls#

References

Browse all entries

Deploy on Yobitel