TL;DR
- Full AD102 die in a data centre form factor — 48 GB GDDR6, third-generation RT Cores, 300 W dual-slot PCIe card.
- Pitched at graphics, VDI, Omniverse and rendering — AI is a secondary use case where L40S is preferred.
- Fourth-generation Tensor Core supports FP8 inference but lower throughput than L40S.
- Effectively the data centre RTX 6000 Ada equivalent; usually overshadowed by L40S in AI conversations.
Overview#
L40 is built on the full AD102 Ada Lovelace die — the same silicon as the GeForce RTX 4090 and the workstation RTX 6000 Ada. It comes in a 300 W passive dual-slot PCIe form factor with 48 GB of ECC GDDR6, designed primarily for enterprise visualisation, VDI and Omniverse workloads.
On paper L40 is similar to L40S but with lower clocks and a slightly different feature emphasis (more RT, less raw tensor throughput). In practice the two are easy to confuse; L40S is the AI-tuned variant and the one most teams should actually pick for inference workloads.
Specifications#
| Metric | L40 |
|---|---|
| Architecture | Ada Lovelace (AD102) |
| FP32 | 90.5 TFLOPS |
| TF32 (Tensor, sparse) | 181 TFLOPS |
| BF16 / FP16 (Tensor, sparse) | 362 TFLOPS |
| FP8 (Tensor, sparse) | 724 TFLOPS |
| INT8 (Tensor, sparse) | 724 TOPS |
| RT Cores | Third-generation |
| Memory | 48 GB GDDR6 ECC |
| Memory bandwidth | 864 GB/s |
| TDP | 300 W |
| Form factor | PCIe Gen4 x16, dual-slot |
| NVLink | Not supported |
L40 and L40S share silicon but differ on clocks, power and tensor throughput. If the workload is AI inference, L40S is almost always the right pick.
When to Pick L40#
- Omniverse and digital-twin workloads where RT Core throughput dominates.
- Enterprise VDI with NVIDIA Virtual GPU software licensed.
- Mixed graphics + AI workloads where the visualisation side dominates.
- Render farms where 48 GB GDDR6 ECC suits scene footprints.
- Pick L40S for AI-first workloads.
- Pick RTX 6000 Ada for workstation deployment scenarios.
Pitfalls#
- Easy to confuse with L40S — verify the exact SKU when reading benchmarks.
- PCIe Gen4 only; Gen5 hosts will not run the link at full host bandwidth.
- No NVLink; multi-card configurations are PCIe-bound.
- GDDR6 bandwidth is solid but well below HBM cards.
Software Notes#
First-class CUDA, OptiX, NVIDIA Virtual GPU and Omniverse Enterprise support. AI inference paths (vLLM, TensorRT-LLM) treat L40 as a standard Ada card with lower throughput than L40S.
References
- NVIDIA L40 Datasheet · NVIDIA