NVIDIA A40 GPU

TL;DR

Workstation-class Ampere GA102 card adapted for the data centre — 48 GB GDDR6, passive cooling, dual-slot.
Pitched primarily at virtual workstations, rendering and visualisation, with secondary inference uses.
300 W TDP, NVLink-bridge support (112 GB/s), no MIG.
Largely superseded by L40 and L40S for new builds; remains common in VDI and rendering fleets.

Overview#

A40 is the data centre cousin of the RTX A6000 — same GA102 silicon, same 48 GB GDDR6, but with passive cooling and a 300 W envelope sized for racked servers. It was the workhorse of NVIDIA's enterprise visualisation lineup through 2022-2023 and is widely deployed for VDI, rendering and CAD workloads.

For AI workloads A40 is a respectable inference card with significantly more memory than A10/A30, but its GDDR6 bandwidth limits its appeal for modern LLMs. By 2026 the L40/L40S are the preferred upgrade path.

Specifications#

Metric	A40
Architecture	Ampere (GA102)
FP32	37.4 TFLOPS
TF32 (Tensor, sparse)	150 TFLOPS
BF16 / FP16 (Tensor, sparse)	299 TFLOPS
INT8 (Tensor, sparse)	598 TOPS
RT Cores	Second-generation
Memory	48 GB GDDR6 ECC
Memory bandwidth	696 GB/s
TDP	300 W
Form factor	PCIe Gen4 x16, dual-slot
NVLink	112 GB/s (bridge)
Display outputs	None (4 mDP optional)

When to Pick A40#

Virtual workstation hosts where NVIDIA Virtual GPU software is licensed and RT Cores matter.
Render farms (V-Ray, Arnold, OctaneRender) where 48 GB GDDR6 fits scenes that overflow 24 GB cards.
Inference workloads of 13B-class models where 48 GB GDDR6 is acceptable.
Existing VDI deployments scaling out incrementally.
Pick L40 / L40S for new builds; Ada-generation Tensor Cores deliver more inference throughput per watt.

Pitfalls#

GDDR6 bandwidth (696 GB/s) is meaningfully lower than HBM cards — modern LLM inference performance is shaped by this.
No MIG support — multi-tenant isolation relies on vGPU licensing.
No FP8; quantised production paths skip A40.
300 W passive cards require server-class airflow; workstation chassis are not appropriate.

Software Notes#

First-class CUDA, OptiX, NVIDIA Virtual GPU and Omniverse Enterprise support. Standard ML stacks treat A40 as an Ampere consumer-derived GPU comparable to A10 with more memory.

References

NVIDIA A40 Datasheet · NVIDIA

Overview#

Specifications#

Metric	A40
Architecture	Ampere (GA102)
FP32	37.4 TFLOPS
TF32 (Tensor, sparse)	150 TFLOPS
BF16 / FP16 (Tensor, sparse)	299 TFLOPS
INT8 (Tensor, sparse)	598 TOPS
RT Cores	Second-generation
Memory	48 GB GDDR6 ECC
Memory bandwidth	696 GB/s
TDP	300 W
Form factor	PCIe Gen4 x16, dual-slot
NVLink	112 GB/s (bridge)
Display outputs	None (4 mDP optional)

When to Pick A40#

Virtual workstation hosts where NVIDIA Virtual GPU software is licensed and RT Cores matter.

Render farms (V-Ray, Arnold, OctaneRender) where 48 GB GDDR6 fits scenes that overflow 24 GB cards.

Inference workloads of 13B-class models where 48 GB GDDR6 is acceptable.

Existing VDI deployments scaling out incrementally.

Pick L40 / L40S for new builds; Ada-generation Tensor Cores deliver more inference throughput per watt.

Pitfalls#

GDDR6 bandwidth (696 GB/s) is meaningfully lower than HBM cards — modern LLM inference performance is shaped by this.

No MIG support — multi-tenant isolation relies on vGPU licensing.

No FP8; quantised production paths skip A40.

300 W passive cards require server-class airflow; workstation chassis are not appropriate.

NVIDIA A40 GPU

Overview#

Specifications#

When to Pick A40#

Pitfalls#

Software Notes#

References

Browse all entries

Deploy on Yobitel

NVIDIA A40 GPU

Overview#

Specifications#

When to Pick A40#

Pitfalls#

Software Notes#

References

Browse all entries

Deploy on Yobitel