NVIDIA T4 Tensor Core GPU

TL;DR

Single-slot 70 W Turing card launched September 2018 — the most-deployed AI accelerator of the 2018-2022 era.
16 GB GDDR6 at 320 GB/s with first-generation Tensor Cores supporting FP16 and INT8.
Standard across AWS g4dn, GCP nvidia-tesla-t4, and Azure NCasT4 — the canonical cloud inference card.
Superseded by L4 in new deployments; CUDA support continues through current LTS releases.

Overview#

T4 is the GPU that made GPU inference cheap at cloud scale. Launched in September 2018 with the Turing architecture, it packed first-generation Tensor Cores into a 70 W single-slot card priced low enough that hyperscalers could offer GPU inference instances at meaningful volume. AWS g4dn, GCP T4 instances, and many on-prem appliances all standardised on T4.

By 2026 T4 is largely succeeded by L4 (Ada Lovelace) in new deployments. It remains widely available second-hand, broadly supported by current CUDA releases, and still useful for lightweight inference, video transcoding and CV workloads.

Specifications#

Metric	T4
Architecture	Turing (TU104)
Process	TSMC 12 nm FFN
FP32	8.1 TFLOPS
BF16 / FP16 (Tensor)	65 TFLOPS
INT8 (Tensor)	130 TOPS
INT4 (Tensor)	260 TOPS
Memory	16 GB GDDR6
Memory bandwidth	320 GB/s
TDP	70 W
Form factor	PCIe Gen3 x16, single-slot low-profile
NVENC / NVDEC	1 / 1
NVLink	Not supported

When T4 Still Makes Sense#

Existing deployments where TCO is well-amortised and workloads have not outgrown 16 GB.
Small-model inference (BERT-base, CNNs, traditional CV) where Turing throughput is adequate.
Video transcoding at modest density (single NVENC/NVDEC).
Educational and prototyping use cases on cheap second-hand cards.
Pick L4 for new builds — same form factor, much higher throughput per watt, FP8 support.

Pitfalls#

First-generation Tensor Cores lack BF16; mixed-precision training paths need FP16 with loss scaling.
No FP8; modern LLM quantisation paths skip T4.
PCIe Gen3 limits host bandwidth in modern servers.
Driver lifecycle: T4 remains supported but newer CUDA features increasingly skip Turing.

Software Notes#

T4 is supported in current CUDA releases (through CUDA 13 at time of writing) and runs most major inference servers — Triton, TensorRT, ONNX Runtime, OpenVINO. vLLM supports T4 with quantised weights but warns about reduced throughput. Most current FP8 / FP4 paths skip T4 entirely.

References

NVIDIA T4 Datasheet · NVIDIA
Turing Architecture Whitepaper · NVIDIA

Overview#

Metric

Architecture

Turing (TU104)

Process

TSMC 12 nm FFN

FP32

8.1 TFLOPS

BF16 / FP16 (Tensor)

65 TFLOPS

INT8 (Tensor)

130 TOPS

INT4 (Tensor)

260 TOPS

Memory

16 GB GDDR6

Memory bandwidth

320 GB/s

TDP

70 W

Form factor

PCIe Gen3 x16, single-slot low-profile

NVENC / NVDEC

1 / 1

NVLink

Not supported

When T4 Still Makes Sense#

Existing deployments where TCO is well-amortised and workloads have not outgrown 16 GB.

Small-model inference (BERT-base, CNNs, traditional CV) where Turing throughput is adequate.

Video transcoding at modest density (single NVENC/NVDEC).

Educational and prototyping use cases on cheap second-hand cards.

Pick L4 for new builds — same form factor, much higher throughput per watt, FP8 support.

NVIDIA T4 Tensor Core GPU

Overview#

Specifications#

When T4 Still Makes Sense#

Pitfalls#

Software Notes#

References

Browse all entries

Deploy on Yobitel

NVIDIA T4 Tensor Core GPU

Overview#

Specifications#

When T4 Still Makes Sense#

Pitfalls#

Software Notes#

References

Browse all entries

Deploy on Yobitel