CXL (Compute Express Link)

TL;DR

Open industry standard for cache-coherent host-to-device interconnect riding on PCIe physical layer.
Three protocols: CXL.io (PCIe semantics), CXL.cache (device-side caching), CXL.mem (memory expansion).
Used for memory expansion modules, memory pooling between hosts, and coherent accelerator attachment.
CXL 3.0 (2022) adds fabric topology and switch support; CXL 3.2 refines it. Adoption is meaningful but slower than the early hype suggested.

Overview#

CXL — Compute Express Link — is an open industry standard for cache-coherent host-to-device communication. It rides on the PCIe physical layer (Gen5 and Gen6) but adds protocol semantics for caching, memory access and accelerator attachment that PCIe alone cannot express.

The pitch is straightforward: as memory-bandwidth needs outpace what can be packaged on a host's local DDR channels, CXL provides a way to attach additional memory — DRAM modules or DDR-attached memory pools — that the CPU can access with reasonable latency and cache-coherent semantics.

Three Protocols#

CXL.io provides PCIe-equivalent semantics for device discovery, configuration and DMA. Every CXL device must implement at least CXL.io.

CXL.cache lets a device cache host memory. This matters for accelerators that want to participate in CPU cache coherence — for example, certain DPUs and SmartNICs.

CXL.mem lets a host treat a device's memory as local. This is the basis of memory expansion modules (CXL Type 3 devices) and memory pooling.

Specifications#

Version	PCIe base	Year	Highlights
CXL 1.0/1.1	PCIe Gen5	2019	Initial standard
CXL 2.0	PCIe Gen5	2020	Switching, persistent memory
CXL 3.0	PCIe Gen6	2022	Fabric topology, peer-to-peer
CXL 3.2	PCIe Gen6	2024	Refinements, security

What CXL is Used For#

Memory expansion: CXL Type 3 modules add DRAM beyond what DDR channels can host.
Memory pooling: shared memory pools across hosts in a rack.
Coherent accelerator attachment: DPUs, FPGAs and certain accelerators that want to share cache lines with the host.
Tiered memory: hot data in local DDR, warm data in CXL-attached memory, cold data in NVMe.

Pitfalls and Adoption Notes#

Latency: CXL-attached memory has higher latency than local DDR. Workloads that randomly access CXL memory hot paths can see noticeable degradation.
Adoption pace: CXL has been slower to deploy than early hype suggested — primarily because most AI accelerators stayed on HBM rather than embracing CXL pooling.
Software: tiered-memory awareness in operating systems and runtimes is still maturing through 2025-2026.
AI-specific use cases for CXL remain narrower than general server use cases; HBM and NVLink occupy most of the memory-bandwidth conversations.

Software and System Notes#

Linux kernel support for CXL devices is mature. NUMA-aware schedulers can treat CXL memory as a NUMA node. AI-specific frameworks generally do not target CXL memory explicitly; HBM and NVLink-attached memory remain the dominant tiers for accelerator workloads.

References

CXL Consortium Specifications · CXL Consortium

Overview#

Three Protocols#

CXL.io provides PCIe-equivalent semantics for device discovery, configuration and DMA. Every CXL device must implement at least CXL.io.

CXL.cache lets a device cache host memory. This matters for accelerators that want to participate in CPU cache coherence — for example, certain DPUs and SmartNICs.

CXL.mem lets a host treat a device's memory as local. This is the basis of memory expansion modules (CXL Type 3 devices) and memory pooling.

Version

PCIe base

Year

Highlights

CXL 1.0/1.1

PCIe Gen5

2019

Initial standard

CXL 2.0

PCIe Gen5

2020

Switching, persistent memory

CXL 3.0

PCIe Gen6

2022

Fabric topology, peer-to-peer

CXL 3.2

PCIe Gen6

2024

Refinements, security

What CXL is Used For#

Memory expansion: CXL Type 3 modules add DRAM beyond what DDR channels can host.

Memory pooling: shared memory pools across hosts in a rack.

Coherent accelerator attachment: DPUs, FPGAs and certain accelerators that want to share cache lines with the host.

Tiered memory: hot data in local DDR, warm data in CXL-attached memory, cold data in NVMe.

Pitfalls and Adoption Notes#

Latency: CXL-attached memory has higher latency than local DDR. Workloads that randomly access CXL memory hot paths can see noticeable degradation.

Adoption pace: CXL has been slower to deploy than early hype suggested — primarily because most AI accelerators stayed on HBM rather than embracing CXL pooling.

Software: tiered-memory awareness in operating systems and runtimes is still maturing through 2025-2026.

AI-specific use cases for CXL remain narrower than general server use cases; HBM and NVLink occupy most of the memory-bandwidth conversations.

CXL (Compute Express Link)

Overview#

Three Protocols#

Specifications#

What CXL is Used For#

Pitfalls and Adoption Notes#

Software and System Notes#

References

Browse all entries

Deploy on Yobitel

CXL (Compute Express Link)

Overview#

Three Protocols#

Specifications#

What CXL is Used For#

Pitfalls and Adoption Notes#

Software and System Notes#

References

Browse all entries

Deploy on Yobitel