AWS Trainium (Trn1)

TL;DR

AWS's first-generation training accelerator launched October 2022; powers EC2 trn1 instances.
Each Trainium chip provides ~190 TFLOPS BF16 with 32 GB HBM per chip; trn1.32xlarge has 16 chips at 512 GB total.
NeuronCore architecture is custom; programmed through the AWS Neuron SDK and XLA.
Largely superseded by Trainium 2 in new deployments; remains available for cost-sensitive training.

Overview#

Trainium is AWS's first in-house training accelerator, launched October 2022 in EC2 trn1 instances. The pitch was straightforward: training-optimised silicon at a meaningful discount to A100/H100 capacity, with the AWS Neuron SDK providing a managed software path.

Through 2023-2024 Trainium found adoption among AWS customers training mid-sized models — Hugging Face, AI21, and various others — but ecosystem reach lagged GPU. Trainium 2 supersedes it for new buys.

Specifications#

Metric	Trainium (per chip)
BF16 / FP16	~190 TFLOPS
FP32	~47 TFLOPS
Memory	32 GB HBM
Memory bandwidth	820 GB/s
NeuronCores per chip	2 (NeuronCore v2)
Inter-chip link	NeuronLink v2
trn1.32xlarge	16 chips, 512 GB total

Architecture and Neuron SDK#

Each Trainium chip contains two NeuronCores — custom systolic-array compute engines paired with on-chip SRAM and HBM controllers. Programming targets the Neuron SDK, which lowers PyTorch (via PyTorch/XLA) or TensorFlow graphs onto NeuronCore-native kernels.

The NeuronLink fabric provides chip-to-chip connectivity within a single instance. Multi-instance training relies on AWS's Elastic Fabric Adapter (EFA) for inter-node communication.

When to Pick Trainium#

Existing AWS workloads where the Neuron SDK has already been integrated.
Cost-sensitive training of 7B-class models where capacity pricing matters.
Pre-existing trn1 reservations.
Pick Trainium 2 for new buys with significantly better throughput and software story.
Pick H100 / H200 if CUDA ecosystem reach is required.

Pitfalls#

Software ecosystem is narrow — Neuron-specific operator support varies model by model.
PyTorch/XLA on Neuron has rough edges versus PyTorch+CUDA.
Long-context attention and quantisation paths trail GPU implementations.
AWS-exclusive: no portability beyond EC2.

Software Notes#

AWS Neuron SDK, PyTorch/XLA on Neuron, and TensorFlow Neuron are the supported paths. Hugging Face Optimum-Neuron provides recipes for common transformer training scenarios.

References

AWS Trainium Product Page · AWS
AWS Neuron SDK Documentation · AWS

Overview#

Metric

Trainium (per chip)

BF16 / FP16

~190 TFLOPS

FP32

~47 TFLOPS

Memory

32 GB HBM

Memory bandwidth

820 GB/s

NeuronCores per chip

2 (NeuronCore v2)

Inter-chip link

NeuronLink v2

trn1.32xlarge

16 chips, 512 GB total

Architecture and Neuron SDK#

The NeuronLink fabric provides chip-to-chip connectivity within a single instance. Multi-instance training relies on AWS's Elastic Fabric Adapter (EFA) for inter-node communication.

When to Pick Trainium#

Existing AWS workloads where the Neuron SDK has already been integrated.

Cost-sensitive training of 7B-class models where capacity pricing matters.

Pre-existing trn1 reservations.

Pick Trainium 2 for new buys with significantly better throughput and software story.

Pick H100 / H200 if CUDA ecosystem reach is required.

AWS Trainium (Trn1)

Overview#

Specifications#

Architecture and Neuron SDK#

When to Pick Trainium#

Pitfalls#

Software Notes#

References

Browse all entries

Deploy on Yobitel

AWS Trainium (Trn1)

Overview#

Specifications#

Architecture and Neuron SDK#

When to Pick Trainium#

Pitfalls#

Software Notes#

References

Browse all entries

Deploy on Yobitel