TL;DR
- Workstation-class Ampere GA102 card adapted for the data centre — 48 GB GDDR6, passive cooling, dual-slot.
- Pitched primarily at virtual workstations, rendering and visualisation, with secondary inference uses.
- 300 W TDP, NVLink-bridge support (112 GB/s), no MIG.
- Largely superseded by L40 and L40S for new builds; remains common in VDI and rendering fleets.
Overview#
A40 is the data centre cousin of the RTX A6000 — same GA102 silicon, same 48 GB GDDR6, but with passive cooling and a 300 W envelope sized for racked servers. It was the workhorse of NVIDIA's enterprise visualisation lineup through 2022-2023 and is widely deployed for VDI, rendering and CAD workloads.
For AI workloads A40 is a respectable inference card with significantly more memory than A10/A30, but its GDDR6 bandwidth limits its appeal for modern LLMs. By 2026 the L40/L40S are the preferred upgrade path.
Specifications#
| Metric | A40 |
|---|---|
| Architecture | Ampere (GA102) |
| FP32 | 37.4 TFLOPS |
| TF32 (Tensor, sparse) | 150 TFLOPS |
| BF16 / FP16 (Tensor, sparse) | 299 TFLOPS |
| INT8 (Tensor, sparse) | 598 TOPS |
| RT Cores | Second-generation |
| Memory | 48 GB GDDR6 ECC |
| Memory bandwidth | 696 GB/s |
| TDP | 300 W |
| Form factor | PCIe Gen4 x16, dual-slot |
| NVLink | 112 GB/s (bridge) |
| Display outputs | None (4 mDP optional) |
When to Pick A40#
- Virtual workstation hosts where NVIDIA Virtual GPU software is licensed and RT Cores matter.
- Render farms (V-Ray, Arnold, OctaneRender) where 48 GB GDDR6 fits scenes that overflow 24 GB cards.
- Inference workloads of 13B-class models where 48 GB GDDR6 is acceptable.
- Existing VDI deployments scaling out incrementally.
- Pick L40 / L40S for new builds; Ada-generation Tensor Cores deliver more inference throughput per watt.
Pitfalls#
- GDDR6 bandwidth (696 GB/s) is meaningfully lower than HBM cards — modern LLM inference performance is shaped by this.
- No MIG support — multi-tenant isolation relies on vGPU licensing.
- No FP8; quantised production paths skip A40.
- 300 W passive cards require server-class airflow; workstation chassis are not appropriate.
Software Notes#
First-class CUDA, OptiX, NVIDIA Virtual GPU and Omniverse Enterprise support. Standard ML stacks treat A40 as an Ampere consumer-derived GPU comparable to A10 with more memory.
References
- NVIDIA A40 Datasheet · NVIDIA