TL;DR
- AWS's first-generation training accelerator launched October 2022; powers EC2 trn1 instances.
- Each Trainium chip provides ~190 TFLOPS BF16 with 32 GB HBM per chip; trn1.32xlarge has 16 chips at 512 GB total.
- NeuronCore architecture is custom; programmed through the AWS Neuron SDK and XLA.
- Largely superseded by Trainium 2 in new deployments; remains available for cost-sensitive training.
Overview#
Trainium is AWS's first in-house training accelerator, launched October 2022 in EC2 trn1 instances. The pitch was straightforward: training-optimised silicon at a meaningful discount to A100/H100 capacity, with the AWS Neuron SDK providing a managed software path.
Through 2023-2024 Trainium found adoption among AWS customers training mid-sized models — Hugging Face, AI21, and various others — but ecosystem reach lagged GPU. Trainium 2 supersedes it for new buys.
Specifications#
| Metric | Trainium (per chip) |
|---|---|
| BF16 / FP16 | ~190 TFLOPS |
| FP32 | ~47 TFLOPS |
| Memory | 32 GB HBM |
| Memory bandwidth | 820 GB/s |
| NeuronCores per chip | 2 (NeuronCore v2) |
| Inter-chip link | NeuronLink v2 |
| trn1.32xlarge | 16 chips, 512 GB total |
Architecture and Neuron SDK#
Each Trainium chip contains two NeuronCores — custom systolic-array compute engines paired with on-chip SRAM and HBM controllers. Programming targets the Neuron SDK, which lowers PyTorch (via PyTorch/XLA) or TensorFlow graphs onto NeuronCore-native kernels.
The NeuronLink fabric provides chip-to-chip connectivity within a single instance. Multi-instance training relies on AWS's Elastic Fabric Adapter (EFA) for inter-node communication.
When to Pick Trainium#
- Existing AWS workloads where the Neuron SDK has already been integrated.
- Cost-sensitive training of 7B-class models where capacity pricing matters.
- Pre-existing trn1 reservations.
- Pick Trainium 2 for new buys with significantly better throughput and software story.
- Pick H100 / H200 if CUDA ecosystem reach is required.
Pitfalls#
- Software ecosystem is narrow — Neuron-specific operator support varies model by model.
- PyTorch/XLA on Neuron has rough edges versus PyTorch+CUDA.
- Long-context attention and quantisation paths trail GPU implementations.
- AWS-exclusive: no portability beyond EC2.
Software Notes#
AWS Neuron SDK, PyTorch/XLA on Neuron, and TensorFlow Neuron are the supported paths. Hugging Face Optimum-Neuron provides recipes for common transformer training scenarios.