TL;DR
- Direct-to-chip (DLC) — also called direct liquid cooling or cold-plate cooling — bolts a metal cold plate onto each CPU, GPU, and HBM stack and circulates a water-based coolant through micro-channels milled into the plate.
- It is the cooling method NVIDIA specifies for HGX-H100, HGX-H200, HGX-B200, and the rack-scale GB200 NVL72; air cooling realistically tops out around 30 kW per rack, while DLC handles 80-120+ kW.
- 85-95 % of rack heat is captured by liquid; the residual (DIMMs, NICs, PSUs, spinning storage) is still cooled by chassis air, so a hybrid air-plus-liquid envelope is the norm.
- Capex sits at roughly $1,875-$5,000 per installed kW of DLC versus $1,000-$2,500 per kW for air — but warm-water operation (30-45 °C facility supply) eliminates the chiller in most temperate climates and unlocks waste-heat reuse.
- Standard chassis form factors are preserved, so service ergonomics, rack PDU layout, and structured cabling stay familiar — the new disciplines are leak detection, coolant chemistry, and CDU N+1 sizing.
Overview#
Direct-to-chip liquid cooling has moved from a niche HPC technique to the default thermal architecture for AI infrastructure. The driver is brutal arithmetic: an HGX-H100 baseboard dissipates 5.6 kW from the eight GPUs alone, an HGX-B200 baseboard substantially more, and a GB200 NVL72 rack approaches 120 kW continuous. A 600 mm-wide cabinet cannot pass enough air through itself to move that much heat without supply temperatures dropping into the single digits — which would require mechanical chillers running year-round and would still produce hot-spots inside the chassis.
DLC intercepts the heat at the source. A copper or nickel-plated cold plate sits on each high-power die, with sub-millimetre micro-channels routing coolant — typically a propylene-glycol-and-water (PGW 25/75) blend or a treated facility-water mixture — across the hottest silicon. Heat is carried out of the chassis through blind-mate quick-disconnect couplings, into a rack manifold, on to a coolant distribution unit (CDU), and finally to the facility's chilled-water, dry-cooler, district-heating, or adiabatic-cooling loop.
Because the supply temperature into the cold plate can be 25-32 °C and still hold a GPU junction below its 90 °C limit, the facility loop can run anywhere from 30 °C to 45 °C. That single fact — warm-water operation — is what makes DLC economically transformative: in most UK and EU climates a dry cooler can reject heat year-round without a single mechanical chiller, cutting both capex and the PUE penalty of compression cooling. Yobitel NeoCloud rack designs use direct-to-chip liquid cooling as the default for any rack above 30 kW; the GB200 NVL72 deployments in NeoCloud rely on it natively. This entry helps you plan a high-density GPU rack — and understand the cooling baseline Yobitel NeoCloud uses for all H100/H200/B200/GB200 deployments.
DLC is sometimes split into 'cold-plate liquid cooling' (single-phase water/glycol) and 'two-phase DLC' (a refrigerant boils inside the plate and condenses elsewhere). This entry focuses on the dominant single-phase variant; two-phase DLC shares the plumbing topology but adds vapour-handling and dielectric-fluid management.
Specifications#
The numbers vary by vendor and by the specific cold-plate design, but the envelope below sits in the middle of the published HGX, GB200 NVL72, and OCP Advanced Cooling Solutions reference designs. Verify final flow and pressure with the server OEM datasheet before committing to a CDU spec.
| Parameter | Typical range | Notes |
|---|---|---|
| Coolant supply temperature (server inlet) | 25-32 °C | Set by the CDU to keep GPU junction below 90 °C. |
| Coolant return temperature (server outlet) | 35-60 °C | Delta-T of 8-15 °C across the cold plate is typical. |
| Coolant supply temperature (facility) | 30-45 °C (ASHRAE W3/W4) | Higher inlet enables year-round free cooling in temperate climates. |
| Flow rate per cold plate | 0.5-1.5 L/min | Higher for B200/B300-class TDPs. |
| Flow rate per rack | 1.0-2.0 L/min per kW of cooled load | Drives manifold and CDU sizing. |
| Pressure drop across rack | 0.5-1.5 bar | Sets secondary-loop pump head. |
| Heat capture to liquid | 85-95 % | Residual captured by chassis air or rear-door HX. |
| Rack thermal capacity | 40-130 kW | Depending on CDU/manifold; NVL72 sits at the top of this band. |
| Coolant medium | PGW 25/75 or treated water + biocide | Conductivity, pH, and inhibitor levels are monitored continuously. |
| Quick-disconnect dripless rating | < 0.1 mL per mate/unmate | OCP ACS spec for blind-mate couplings. |
| Cold-plate material | Copper or nickel-plated copper | Aluminium occasionally for cost-sensitive CPU-only plates. |
ASHRAE TC 9.9 liquid-cooling classes W1-W5 describe the facility supply temperature. W3 (up to 32 °C) and W4 (up to 45 °C) are the practical targets for AI data centres because they unlock free cooling — no mechanical chiller — in temperate climates.
Architecture and Plumbing#
A production DLC deployment has three nested loops, and confusion about which loop carries which heat is the single biggest source of facility-engineering errors. The diagram below maps the heat flow end-to-end.
- Primary (technology) loop — sealed inside the server. Coolant flows from chassis inlet through a small in-server manifold, through every cold plate in parallel or series-parallel, and exits via dripless blind-mate quick disconnects at the rear of the chassis.
- Secondary (rack) loop — the rack manifold collects flow from every server and routes it to the CDU. Operating pressure is 1.5-3.0 bar, total flow scales with the rack's cooled-load rating, and the loop is chemistry-controlled, particulate-filtered, and pressure-regulated.
- Facility (primary) loop — the CDU isolates the rack loop from the building loop via a brazed-plate heat exchanger. Facility water can be 30-45 °C; it is the loop that ultimately rejects heat at the dry cooler, adiabatic cooler, district-heating intake, or chiller plant.
Heat flow — single rack, single CDU
============================================
+---------------------------------------+
| Server chassis (sealed primary loop) |
| |
| GPU + HBM cold plates -- in series |
| CPU cold plate |
| Optional VRM cold plate |
| |
| inlet QD <-----+ +-----> QD outlet
+-----------------|------|--------------+
| |
v |
+---------------------------------------+
| Rack manifold (vertical, in-cabinet) |
| - per-server supply + return ports |
| - leak detection rope along base |
+-----------------|------|--------------+
| |
v |
+---------------------------------------+
| CDU (in-row or in-rack) |
| - brazed plate heat exchanger |
| - 2x EC pumps (N+1) |
| - reservoir + air separator |
| - filtration 50-100 um |
+-----------------|------|--------------+
| |
v |
+---------------------------------------+
| Facility loop (30-45 C supply) |
| -> dry cooler (preferred) |
| -> adiabatic cooler (hot days) |
| -> district heat / waste-heat reuse|
| -> chiller (only if W3/W4 fails) |
+---------------------------------------+Run the rack-side loop separately from the building chilled-water loop even when the temperatures could in principle match. The CDU's job is not just thermal isolation but chemistry isolation — facility water inhibitors are selected for steel pipework and will corrode the copper micro-channels in a cold plate within months.
Form Factor, Power and Thermal Envelope#
DLC preserves the standard 19-inch EIA-310 server form factor — the same chassis depth, the same rail kit, the same hot-swap drive bays — and adds two blind-mate fluid couplings at the rear of the chassis. That is a critical commercial choice: a customer can adopt DLC without redesigning racks, PDUs, structured cabling, or service procedures. The Open Compute Project's Advanced Cooling Solutions (ACS) workstream standardises the coupling geometry, the manifold-to-rack interface, and the rack-level CDU footprint, so multi-vendor deployments are now practical.
Power density and thermal load are correlated but not identical. A rack pulling 100 kW from the busbar typically rejects 95-100 kW as heat; DLC captures 85-95 % of that into the liquid loop, leaving 5-15 kW as residual chassis-air heat that the room or a rear-door HX still has to handle. Sizing the room's air-handling system for the residual — not the full rack load — is one of the largest capex savings DLC unlocks.
| Rack class | Cooling approach | Air HVAC sized for | Notes |
|---|---|---|---|
| 15-30 kW (legacy) | CRAH + hot-aisle containment | Full rack load | Pre-AI baseline. |
| 30-50 kW | Rear-door HX or DLC retrofit | Full or residual | Transitional density. |
| 50-80 kW (HGX-H100/H200) | DLC + RDHx or DLC + air for residual | 5-15 % of rack load | Standard AI training deployment. |
| 80-130 kW (GB200 NVL72, dense B200) | DLC mandatory | 5-10 % of rack load | Air alone is not feasible. |
| 130-250 kW | Immersion or two-phase DLC | Near zero | Beyond standard DLC envelope. |
Vendor Ecosystem#
The DLC supply chain is now mature enough that most components are commodity in the engineering sense — multiple vendors, interoperable form factors, and competitive pricing. The map below names the active players in 2026.
| Layer | Active vendors | Notes |
|---|---|---|
| Cold plates | CoolIT Systems, Asetek, Motivair, JetCool, Boyd, AVC | JetCool's micro-jet impingement is a notable point design for B200-class TDPs. |
| Quick disconnects | Stäubli, CPC (Colder Products), Parker, Eaton | Dripless blind-mate couplings; OCP ACS spec compliant. |
| Rack manifolds | nVent, Vertiv, Schneider, CoolIT, Motivair | Vertical manifolds with per-server supply/return ports. |
| CDUs (in-rack) | CoolIT CHx40/CHx80, Vertiv XDU450, Motivair CDU, nVent CDU | 40-80 kW units; bottom-of-rack. |
| CDUs (in-row) | Vertiv XDU, nVent CDU1350, Schneider EcoStruxure, Stulz CyberCool | 150 kW-1.5 MW per unit. |
| Server OEMs (DLC-ready) | Supermicro, Dell PowerEdge XE, HPE Cray XD, Lenovo ThinkSystem SR685a V3, Gigabyte, ASUS, Quanta, Wiwynn | GB200 NVL72 ships with integrated DLC from NVIDIA partners. |
| Coolant chemistry | Dynalene, Chemtreat, Nalco, ClearWater | PGW formulations, biocide and inhibitor packages. |
| Leak detection | RLE Technologies, TraceTek, Dorlen, RDM | Liquid-sensing ropes and point sensors. |
| Heat rejection | Munters, Vertiv Liebert, Stulz, BAC, Evapco | Dry coolers, adiabatic-assist, and waste-heat integration. |
NVIDIA's GB200 NVL72 ships as a rack-scale system with DLC integrated by NVIDIA-qualified partners (Supermicro, Dell, HPE, Lenovo, Wiwynn). The cold-plate selection, manifold, and in-rack CDU are part of the reference design; the customer specifies the facility-water interface (W3/W4) and the heat-rejection system.
Sizing Guide#
Sizing DLC is a top-down exercise — start with the rack thermal load, work back to the manifold, then the CDU, then the facility loop. The shortcuts below cover the common cases.
- Per-rack cooled load: take the rack's nameplate power (e.g. 100 kW for an HGX-B200 rack), multiply by 0.90 to get the liquid-cooled fraction. That is the figure all subsequent calculations work from.
- Secondary-loop flow: at delta-T of 10 °C, water removes ~6.97 kJ per litre. For a 90 kW liquid load that is ~13 L/min — round up to 1.5 L/min per kW for safety margin and design transients.
- CDU capacity: never run a CDU above 80 % nameplate during sustained operation. A 100 kW liquid load wants a 150 kW CDU; in N+1, that means two 150 kW CDUs feeding the row.
- Manifold diameter: at 1.5 m/s velocity (a reasonable upper bound for noise and erosion), a 40 mm ID manifold passes ~110 L/min — enough for an 80-110 kW rack. Step up to 50 mm for NVL72-class.
- Facility-loop sizing: for warm-water W3/W4 operation, design for an approach temperature of 3-5 °C at the CDU heat exchanger. That gives a facility supply of 30-37 °C and lets the dry cooler size for the local 99 %-design wet-bulb.
- Free-cooling threshold: in the UK and most of northern Europe, design facility supply of 35 °C is achievable with dry coolers year-round at 99 % of hours. London 99 %-design wet bulb is ~20 °C, dry bulb ~28 °C — a dry cooler with a 7 °C approach hits 35 °C supply on the hottest day.
# Quick sanity check — DLC sizing for a 100 kW HGX-B200 rack
rack_power_kw = 100 # nameplate
liquid_fraction = 0.90 # DLC capture rate
liquid_load_kw = rack_power_kw * liquid_fraction # 90 kW
# Secondary-loop flow at delta-T = 10 C
cp_water_kj_per_lk = 4.18 # specific heat
delta_t_c = 10
flow_lpm = (liquid_load_kw * 60) / (cp_water_kj_per_lk * delta_t_c)
print(f"Secondary flow: {flow_lpm:.1f} L/min") # ~129 L/min
# Design margin: round up to 1.5 L/min per kW of load
flow_design_lpm = 1.5 * liquid_load_kw
print(f"Design flow: {flow_design_lpm:.1f} L/min") # 135 L/min
# CDU sizing — 80 % maximum sustained load
cdu_capacity_kw = liquid_load_kw / 0.80
print(f"CDU capacity: {cdu_capacity_kw:.0f} kW") # 113 kW -> spec 150 kW unit
# N+1 — two 150 kW CDUs in the row
print("N+1 plant: 2 x 150 kW CDU (one held in standby)")Cost and TCO#
Capital cost ranges below are USD per installed kW of DLC cooling capacity, sourced from public Uptime Institute and 451 Research benchmarks plus published OCP reference designs. They cover the cold plates, manifolds, CDUs, in-rack plumbing, and the secondary-loop piping inside the data hall; they exclude the facility-loop heat-rejection plant (dry coolers, towers) and any building works.
- TCO break-even versus air: typically 18-36 months at 100 kW rack density and UK/EU energy prices ($0.18-0.32 per kWh commercial), driven primarily by chiller elimination and fan-power reduction (50-70 %).
- Waste-heat reuse: facility return water at 45-55 °C feeds district heating, greenhouse heating, or pre-heated process water. Recovered heat is typically credited at $0.04-0.08 per kWh recovered — a meaningful TCO offset for sites with a nearby heat off-taker.
- Embodied cost: cold-plate copper is the largest BOM item; pricing tracks LME copper. A B200-class plate is 1.5-2.0 kg of copper per GPU position.
- Software-license neutral: DLC has no per-port or per-socket licensing component. All capex/opex is hardware and engineering.
- First-deployment premium: a customer's first DLC build typically runs 20-30 % over benchmark due to learning-curve costs in commissioning, training, and operational runbook development.
| Approach | Capex per cooled kW (USD) | Annual opex per kW (USD) | PUE achievable |
|---|---|---|---|
| Air cooling (CRAH + containment) | $1,000-$2,500 | $140-$220 | 1.30-1.50 |
| Rear-door heat exchanger | $1,500-$3,000 | $110-$180 | 1.20-1.30 |
| Direct-to-chip (single-phase) | $1,875-$5,000 | $70-$130 | 1.10-1.20 |
| Single-phase immersion | $3,500-$7,500 | $60-$110 | 1.03-1.10 |
| Two-phase immersion | $5,000-$12,000 | $60-$100 | 1.02-1.05 |
Migration and Alternatives#
DLC is rarely a greenfield-only decision. Most operators arrive at it from either an air-cooled brownfield or an immersion experiment, and the migration path matters as much as the steady-state design.
- From air to DLC (brownfield retrofit): the hard part is plumbing — getting chilled water or warm-water supply and return into the data hall. In-rack CDUs (no facility-water connection; reject heat to chassis air or to a small dry-cooler outside the hall) let a single rack go DLC without building works, at the cost of higher per-rack capex.
- From air to DLC (new build): standard practice for any rack above 30 kW. Plan facility-water risers and slab penetrations from day one; commission the secondary loop before any IT lands.
- From immersion to DLC: rare but happens when operators want to standardise around blind-mate serviceability or shed PFAS regulatory exposure. The migration is a full rack swap — there is no half-state.
- From DLC to immersion: the upgrade path when rack density goes above 130 kW. Most operators retain DLC for the GPU/HBM heat and add immersion for chassis-density variants, which is engineering-heavy and uncommon.
- Hybrid DLC + RDHx: the dominant new-build pattern. DLC captures 85-95 % of the heat, a rear-door HX on the same facility loop captures the chassis-air residual, and the data hall sees near-neutral exhaust.
- Doing nothing (staying on air): viable only below 30 kW per rack. The H100 generation made this the line in the sand; B200 and GB200 push it lower.
Pitfalls and Operational Notes#
Most DLC failures are not silicon failures; they are loop failures. The discipline below is what separates a stable cluster from one that takes a downtime hit every quarter. First-time-installer projects have a high commissioning-failure rate — Uptime Institute publishes annual figures showing 30-40 % of first DLC builds experience a notable thermal or fluid incident in their first 12 months.
- Coolant chemistry drift: pH (target 8.0-9.5), conductivity (< 50 µS/cm for water-glycol), biocide concentration, and corrosion-inhibitor levels must be measured at least monthly and topped up quarterly. Out-of-spec coolant corrodes cold plates and seeds biofilm that clogs micro-channels.
- Quick-disconnect leaks: blind-mate couplings are the most likely failure point during service. Drip trays, leak-detection ropes along the base of every rack and under the raised floor, and a documented response runbook (isolate, drain, swap, re-pressurise) are mandatory.
- Air ingress: any air in the secondary loop reduces heat transfer at the cold plate by 30-50 % locally and creates the conditions for cavitation in the pumps. Automatic air separators on the CDU and disciplined fill procedures (slow fill, vent at high points) prevent the failure mode.
- CDU as single point of failure: a CDU outage takes the whole row offline within 60-120 seconds. N+1 CDU sizing and cross-tied manifolds (both CDUs feed both halves of the row through valved tie-ins) are non-negotiable for production AI clusters above 500 kW.
- Mixed-temperature racks on a shared loop: running 700 W GPUs and lower-density storage racks on the same secondary loop forces a temperature compromise that wastes free-cooling headroom. Segregate by class of workload.
- Service-time hygiene: when a server is pulled, residual coolant on the disconnect must not drip onto the busbar below. Most leaks at this stage are human, not mechanical — train, runbook, and audit.
- Floor loading: a rack with a 110 kW DLC envelope and an in-rack CDU can weigh 1,400-1,800 kg loaded. Verify slab loading at design time; raised-floor builds may need additional stringers.
- Commissioning rigour: pressure-test the entire secondary loop at 1.5× working pressure for at least 24 hours before any IT lands. Pull a coolant sample, run a chemistry panel, and confirm filtration before flow.
- Documentation handover: the DLC installer often leaves with the only complete P&ID, valve list, and isolation procedure. Insist on a complete operational handover pack and re-walk it with the in-house ops team before first IT load.
The single most common first-year incident is a leak at a manifold drain valve left in the wrong position after commissioning. The valve labelling and a positive-confirmation valve-line-up procedure before pressurisation prevent virtually every instance.
Where DLC Sits in the Yobitel Stack#
Yobitel's NeoCloud reference design uses warm-water DLC (ASHRAE W3) as the default for every H100, H200, B200, and GB200 NVL72 rack we deploy, with rear-door heat exchangers handling the residual chassis-air heat on the same secondary loop. UK-sovereign deployments target sub-1.20 annualised PUE without mechanical chillers, supported by dry-cooler heat rejection and (where the site allows) waste-heat reuse to district heating.
For customers landing GB200 NVL72 racks into existing colocation suites, our managed-ops team treats DLC commissioning as a distinct project phase with its own acceptance criteria — pressure test, chemistry sample, leak-detection coverage map, and a 30-day shakedown — before any production workload is scheduled onto the cluster.
References#
DLC is governed by a small set of authoritative references. ASHRAE TC 9.9 defines the thermal envelope; the Open Compute Project's Advanced Cooling Solutions (ACS) workstream publishes blind-mate connector and manifold standards; NVIDIA's HGX and GB200 thermal guides specify the cold-plate geometry and flow requirements per platform. A serious deployment cross-checks all three before locking specifications.
References
- ASHRAE TC 9.9 — Thermal Guidelines for Liquid-Cooled Data Processing Environments · ASHRAE
- Open Compute Project — Advanced Cooling Solutions · OCP
- NVIDIA HGX H100 and GB200 NVL72 Thermal and Mechanical Guides · NVIDIA
- Uptime Institute — Liquid Cooling Guidance for Data Centers · Uptime Institute
- CoolIT Systems — Rack DLC Reference Architecture · CoolIT Systems