NVIDIA L40 GPU

TL;DR

Full AD102 die in a data centre form factor — 48 GB GDDR6, third-generation RT Cores, 300 W dual-slot PCIe card.
Pitched at graphics, VDI, Omniverse and rendering — AI is a secondary use case where L40S is preferred.
Fourth-generation Tensor Core supports FP8 inference but lower throughput than L40S.
Effectively the data centre RTX 6000 Ada equivalent; usually overshadowed by L40S in AI conversations.

Overview#

L40 is built on the full AD102 Ada Lovelace die — the same silicon as the GeForce RTX 4090 and the workstation RTX 6000 Ada. It comes in a 300 W passive dual-slot PCIe form factor with 48 GB of ECC GDDR6, designed primarily for enterprise visualisation, VDI and Omniverse workloads.

On paper L40 is similar to L40S but with lower clocks and a slightly different feature emphasis (more RT, less raw tensor throughput). In practice the two are easy to confuse; L40S is the AI-tuned variant and the one most teams should actually pick for inference workloads.

Specifications#

Metric	L40
Architecture	Ada Lovelace (AD102)
FP32	90.5 TFLOPS
TF32 (Tensor, sparse)	181 TFLOPS
BF16 / FP16 (Tensor, sparse)	362 TFLOPS
FP8 (Tensor, sparse)	724 TFLOPS
INT8 (Tensor, sparse)	724 TOPS
RT Cores	Third-generation
Memory	48 GB GDDR6 ECC
Memory bandwidth	864 GB/s
TDP	300 W
Form factor	PCIe Gen4 x16, dual-slot
NVLink	Not supported

L40 and L40S share silicon but differ on clocks, power and tensor throughput. If the workload is AI inference, L40S is almost always the right pick.

When to Pick L40#

Omniverse and digital-twin workloads where RT Core throughput dominates.
Enterprise VDI with NVIDIA Virtual GPU software licensed.
Mixed graphics + AI workloads where the visualisation side dominates.
Render farms where 48 GB GDDR6 ECC suits scene footprints.
Pick L40S for AI-first workloads.
Pick RTX 6000 Ada for workstation deployment scenarios.

Pitfalls#

Easy to confuse with L40S — verify the exact SKU when reading benchmarks.
PCIe Gen4 only; Gen5 hosts will not run the link at full host bandwidth.
No NVLink; multi-card configurations are PCIe-bound.
GDDR6 bandwidth is solid but well below HBM cards.

Software Notes#

First-class CUDA, OptiX, NVIDIA Virtual GPU and Omniverse Enterprise support. AI inference paths (vLLM, TensorRT-LLM) treat L40 as a standard Ada card with lower throughput than L40S.

References

NVIDIA L40 Datasheet · NVIDIA

Overview#

Specifications#

Metric	L40
Architecture	Ada Lovelace (AD102)
FP32	90.5 TFLOPS
TF32 (Tensor, sparse)	181 TFLOPS
BF16 / FP16 (Tensor, sparse)	362 TFLOPS
FP8 (Tensor, sparse)	724 TFLOPS
INT8 (Tensor, sparse)	724 TOPS
RT Cores	Third-generation
Memory	48 GB GDDR6 ECC
Memory bandwidth	864 GB/s
TDP	300 W
Form factor	PCIe Gen4 x16, dual-slot
NVLink	Not supported

L40 and L40S share silicon but differ on clocks, power and tensor throughput. If the workload is AI inference, L40S is almost always the right pick.

When to Pick L40#

Omniverse and digital-twin workloads where RT Core throughput dominates.

Enterprise VDI with NVIDIA Virtual GPU software licensed.

Mixed graphics + AI workloads where the visualisation side dominates.

Render farms where 48 GB GDDR6 ECC suits scene footprints.

Pick L40S for AI-first workloads.

Pick RTX 6000 Ada for workstation deployment scenarios.

NVIDIA L40 GPU

Overview#

Specifications#

When to Pick L40#

Pitfalls#

Software Notes#

References

Browse all entries

Deploy on Yobitel

NVIDIA L40 GPU

Overview#

Specifications#

When to Pick L40#

Pitfalls#

Software Notes#

References

Browse all entries

Deploy on Yobitel