TL;DR
- A CDU is the interface between the technology cooling system (rack-side liquid loop, often called the TCS) and the facility water system (FWS). It contains a plate heat exchanger, pumps, a reservoir, controls, and instrumentation.
- Two form factors: in-row CDUs (~150-300 kW per unit, sit alongside racks) and in-rack CDUs (40-80 kW, fit inside a single rack).
- CDUs handle every chemistry, pressure, and flow difference between the loops — facility water might be 40 °C raw municipal water; the rack loop has to be 25-32 °C, treated, and pressure-controlled.
- Sizing for production AI clusters is always N+1 — a CDU outage takes whatever it serves offline within minutes.
Overview#
Every liquid-cooled data centre — whether direct-to-chip, immersion, or rear-door — has CDUs somewhere. They are the unsung workhorses of the thermal architecture: a brazed-plate heat exchanger to transfer heat between two physically separate water loops, redundant pumps to drive the secondary side, a reservoir to absorb thermal swings, filtration, conductivity monitoring, and a control system to keep the rack-side loop within the cold-plate specification.
The reason CDUs exist is that facility water and rack water cannot be the same loop. Facility water is treated for the building system as a whole — open to evaporative towers, exposed to corrosion inhibitors selected for steel pipework, and operating across whatever pressure and flow regime the central plant chose. Rack water has to be soft, particulate-free, pressure-stable, and within a tight temperature window — because the cold plate has 0.5 mm micro-channels that clog instantly if either chemistry or flow misbehaves.
In-Row vs In-Rack#
| Property | In-row CDU | In-rack CDU |
|---|---|---|
| Capacity per unit | 150-1500 kW | 40-80 kW |
| Footprint | Half-rack or full-rack alongside the row | Bottom of single rack |
| Serves | Up to 8-16 racks | One rack |
| Failure blast radius | Whole row | One rack |
| Capex per kW | Lower | Higher |
| Best for | Production AI clusters | Pilots, small edge sites |
Typical Operating Specifications#
| Parameter | Typical range |
|---|---|
| Facility-side supply | 30-45 °C (warm-water W3/W4) |
| Rack-side supply | 17-32 °C |
| Approach temperature | 2-5 °C |
| Pump count | 2 (N+1) minimum, often 3 |
| Pump head | 20-40 m |
| Secondary-loop pressure | 1.5-3.0 bar |
| Filtration | 50-100 µm bag or cartridge |
| Conductivity monitoring | < 50 µS/cm typical for water-glycol |
| Power draw | 1-3 % of cooled load |
Approach temperature — the difference between the warm side leaving the HX and the cold side entering — is the cheapest knob to turn. A 2 °C approach lets you run facility water 8-10 °C warmer than a 10 °C approach, which often eliminates a chiller entirely.
Trade-offs and Design Choices#
- Redundancy: N+1 pump configuration is mandatory; N+1 CDU configuration is mandatory for production AI clusters above ~500 kW.
- Cross-tied manifolds: tying CDU outputs into a shared rack manifold preserves cooling during a CDU outage at the cost of more pipework and valve complexity.
- Variable-speed pumps: EC pump drives reduce parasitic power and let the control system match flow to actual rack load.
- Make-up water: CDUs need a clean, deaerated make-up source. Plumbing a polished water supply to the CDU room avoids tank-truck refills.
- Telemetry: modern CDUs expose Modbus or BACnet for flow, pressure, supply/return temperature, leak status, and pump state. Wire this into the DCIM and the alerting platform.
When to Use What#
- In-row, 1+ MW: standard for production AI training clusters built around HGX or NVL72 racks.
- In-rack: pilots, R&D pods, and small edge deployments where the per-rack cost premium is acceptable for blast-radius isolation.
- Central-plant CDUs (multi-MW): hyperscaler builds, where the CDU function lives in a dedicated mechanical room rather than the data hall.
Operational Pitfalls#
- Reservoir starvation: under-sized reservoirs cannot absorb sudden load swings (a job completing across hundreds of GPUs). Sizing rule of thumb: reservoir volume ≥ 1 minute of full-flow operation.
- Pump cavitation: under-pressurised inlets cavitate the pump. Always verify NPSH (net positive suction head) at commissioning, not in production.
- Glycol degradation: PGW degrades over years; conductivity creeps up; corrosion inhibitors deplete. Annual fluid sampling is mandatory.
- Heat exchanger fouling: scale on the facility side reduces heat transfer. Clean-in-place (CIP) provisions on the HX simplify maintenance.
- Leak detection placement: under-CDU leak ropes are obvious, but the most common leak is at the manifold connections in the rack. Plan detection coverage end-to-end.