Intel Gaudi 3 Accelerator

TL;DR

Launched 2024; aimed at H100 on training and inference economics with a 5 nm process and refreshed MME design.
128 GB HBM2e at 3.7 TB/s; 1.8 PFLOPS BF16, 1.8 PFLOPS FP8 (matrix engine throughput).
24 integrated 200 GbE RoCE ports — direct successor to Gaudi 2's networking story.
Production deployments at IBM Cloud and on-prem partners; Intel's roadmap for a Gaudi successor is unclear post-Falcon Shores consolidation.

Overview#

Gaudi 3 is Intel's most recent Habana-lineage accelerator, launched in 2024 with the headline claim of H100-class performance at a meaningful price discount. The architectural pattern is the same — TPCs plus MMEs, SynapseAI compiler, integrated RoCE networking — but the silicon moves to TSMC 5 nm and memory grows to 128 GB HBM2e.

The competitive position is reasonable: published benchmarks show Gaudi 3 trading wins with H100 on transformer training and Llama-class inference. The software gap remains the main barrier to adoption; teams not already invested in SynapseAI face a non-trivial onboarding cost.

Specifications#

Metric	Gaudi 3
Architecture	Habana custom (refreshed TPC + MME)
Process	TSMC 5 nm
BF16 (MME)	1,835 TFLOPS
FP8 (MME)	1,835 TFLOPS
Memory	128 GB HBM2e
Memory bandwidth	3.7 TB/s
TDP	900 W
Integrated networking	24× 200 GbE RoCE
Form factor	OAM 2.0

Architecture and Networking#

Gaudi 3 doubles the MME throughput per chip versus Gaudi 2 and refreshes the TPC ISA. The split between TPCs (general compute) and MMEs (dense matmul) remains the central programming abstraction. SynapseAI handles the scheduling and graph lowering.

The integrated 200 GbE RoCE story is the operational highlight. A standard 8-card Gaudi 3 server provides 4.8 Tb/s of GPU-attached networking without separate NICs. For sovereign or budget-sensitive builds where InfiniBand procurement is awkward, this remains genuinely useful.

When to Pick Gaudi 3#

Cost-sensitive training of 7B-70B models where SynapseAI tooling is acceptable.
Clusters where integrated Ethernet fabric simplifies the scale-out story.
Sovereign and supply-diversified deployments seeking a credible non-NVIDIA / non-AMD path.
Pick H100 / H200 if CUDA ecosystem reach is required.
Pick MI300X / MI325X for larger HBM pools per device.

Pitfalls#

Software ecosystem is narrower; many post-2024 LLM optimisations land on CUDA first.
Roadmap uncertainty — Intel's Falcon Shores plans were repeatedly revised through 2024-2025.
Compiler-first workflow can produce surprising performance cliffs.
HBM2e (not HBM3 or HBM3e) limits decode-bound inference throughput relative to H100/H200.

Software Notes#

SynapseAI 1.x and Habana PyTorch remain the production paths. Optimum-Habana provides ready-made recipes for Llama, Mistral, Mixtral and other common models. vLLM has a maintained Habana backend; TensorRT-LLM and SGLang remain NVIDIA-specific.

References

Intel Gaudi 3 Product Brief · Intel
Gaudi 3 Whitepaper · Intel

Overview#

Metric

Gaudi 3

Architecture

Habana custom (refreshed TPC + MME)

Process

TSMC 5 nm

BF16 (MME)

1,835 TFLOPS

FP8 (MME)

1,835 TFLOPS

Memory

128 GB HBM2e

Memory bandwidth

3.7 TB/s

TDP

900 W

Integrated networking

24× 200 GbE RoCE

Form factor

OAM 2.0

Architecture and Networking#

When to Pick Gaudi 3#

Cost-sensitive training of 7B-70B models where SynapseAI tooling is acceptable.

Clusters where integrated Ethernet fabric simplifies the scale-out story.

Sovereign and supply-diversified deployments seeking a credible non-NVIDIA / non-AMD path.

Pick H100 / H200 if CUDA ecosystem reach is required.

Pick MI300X / MI325X for larger HBM pools per device.

Pitfalls#

Software ecosystem is narrower; many post-2024 LLM optimisations land on CUDA first.

Roadmap uncertainty — Intel's Falcon Shores plans were repeatedly revised through 2024-2025.

Compiler-first workflow can produce surprising performance cliffs.

HBM2e (not HBM3 or HBM3e) limits decode-bound inference throughput relative to H100/H200.

Intel Gaudi 3 Accelerator

Overview#

Specifications#

Architecture and Networking#

When to Pick Gaudi 3#

Pitfalls#

Software Notes#

References

Browse all entries

Deploy on Yobitel

Intel Gaudi 3 Accelerator

Overview#

Specifications#

Architecture and Networking#

When to Pick Gaudi 3#

Pitfalls#

Software Notes#

References

Browse all entries

Deploy on Yobitel