TL;DR
- Open industry standard for cache-coherent host-to-device interconnect riding on PCIe physical layer.
- Three protocols: CXL.io (PCIe semantics), CXL.cache (device-side caching), CXL.mem (memory expansion).
- Used for memory expansion modules, memory pooling between hosts, and coherent accelerator attachment.
- CXL 3.0 (2022) adds fabric topology and switch support; CXL 3.2 refines it. Adoption is meaningful but slower than the early hype suggested.
Overview#
CXL — Compute Express Link — is an open industry standard for cache-coherent host-to-device communication. It rides on the PCIe physical layer (Gen5 and Gen6) but adds protocol semantics for caching, memory access and accelerator attachment that PCIe alone cannot express.
The pitch is straightforward: as memory-bandwidth needs outpace what can be packaged on a host's local DDR channels, CXL provides a way to attach additional memory — DRAM modules or DDR-attached memory pools — that the CPU can access with reasonable latency and cache-coherent semantics.
Three Protocols#
CXL.io provides PCIe-equivalent semantics for device discovery, configuration and DMA. Every CXL device must implement at least CXL.io.
CXL.cache lets a device cache host memory. This matters for accelerators that want to participate in CPU cache coherence — for example, certain DPUs and SmartNICs.
CXL.mem lets a host treat a device's memory as local. This is the basis of memory expansion modules (CXL Type 3 devices) and memory pooling.
Specifications#
| Version | PCIe base | Year | Highlights |
|---|---|---|---|
| CXL 1.0/1.1 | PCIe Gen5 | 2019 | Initial standard |
| CXL 2.0 | PCIe Gen5 | 2020 | Switching, persistent memory |
| CXL 3.0 | PCIe Gen6 | 2022 | Fabric topology, peer-to-peer |
| CXL 3.2 | PCIe Gen6 | 2024 | Refinements, security |
What CXL is Used For#
- Memory expansion: CXL Type 3 modules add DRAM beyond what DDR channels can host.
- Memory pooling: shared memory pools across hosts in a rack.
- Coherent accelerator attachment: DPUs, FPGAs and certain accelerators that want to share cache lines with the host.
- Tiered memory: hot data in local DDR, warm data in CXL-attached memory, cold data in NVMe.
Pitfalls and Adoption Notes#
- Latency: CXL-attached memory has higher latency than local DDR. Workloads that randomly access CXL memory hot paths can see noticeable degradation.
- Adoption pace: CXL has been slower to deploy than early hype suggested — primarily because most AI accelerators stayed on HBM rather than embracing CXL pooling.
- Software: tiered-memory awareness in operating systems and runtimes is still maturing through 2025-2026.
- AI-specific use cases for CXL remain narrower than general server use cases; HBM and NVLink occupy most of the memory-bandwidth conversations.
Software and System Notes#
Linux kernel support for CXL devices is mature. NUMA-aware schedulers can treat CXL memory as a NUMA node. AI-specific frameworks generally do not target CXL memory explicitly; HBM and NVLink-attached memory remain the dominant tiers for accelerator workloads.
References
- CXL Consortium Specifications · CXL Consortium