NVIDIA NeMo Framework

TL;DR

Open-source framework for LLM, speech, and multimodal model training, fine-tuning, and alignment, built on Megatron-LM + Apex + PyTorch Lightning + Hydra.
Provides end-to-end recipes — data prep, pretrain, SFT, LoRA, DPO, RLHF, evaluation — rather than primitives alone.
Used as NVIDIA's reference training stack for partner-led foundation-model projects and for the Nemotron family of NVIDIA-trained open models.

Overview#

NeMo Framework wraps Megatron-LM's parallelism and kernel primitives in a structured project layout, configuration system (Hydra), and recipe catalogue. Where Megatron-LM is 'here are the building blocks', NeMo is 'here is a working pretraining run for Llama-style 70B on 256 H100 — modify the config to taste'.

It also extends Megatron in directions Megatron itself does not cover: alignment (SFT, DPO, RLHF), parameter-efficient fine-tuning (LoRA, P-Tuning), evaluation harnesses, multimodal architectures (NeMo Multimodal), and speech (NeMo Speech, the codebase behind NVIDIA's Riva ASR/TTS models).

What NeMo Provides Beyond Megatron#

Hydra-based configs with composable sub-configs — swap parallelism, optimiser, or dataset by changing one yaml stanza.
Production data pipeline (NeMo Curator) — deduplication, quality filtering, language identification at trillion-token scale.
Alignment recipes — SFT, DPO, RLHF (with NeMo Aligner), reward-model training.
PEFT — LoRA, IA3, P-Tuning v2, prompt tuning, all swappable via config.
Multimodal — vision-language (CLIP, NeVA), text-to-image (Stable Diffusion derivatives).
Speech — Conformer, FastConformer ASR; FastPitch, RAD-TTS, P-Flow TTS.
Evaluation harness covering lm-evaluation-harness, MT-Bench, and benchmark suites.

Mechanism#

Under the hood NeMo is Megatron Core for parallelism plus Apex for fused optimisers plus Transformer Engine for FP8 plus PyTorch Lightning for the training loop scaffolding. The Lightning-based trainer means logging, checkpointing, EMA, gradient clipping, and similar plumbing come for free — and you can integrate with any Lightning-compatible logger (Weights & Biases, MLflow, TensorBoard).

Recipes ship as Python entry-points: `python /opt/NeMo/examples/nlp/language_modeling/megatron_gpt_pretraining.py --config-path=conf --config-name=megatron_llama3_70b` runs a tested config, with overrides accepted on the command line.

When to Use#

Use NeMo when you want Megatron-LM-class parallelism and FP8 performance without writing Megatron infrastructure code yourself. The sweet spot is teams running 32-1,024 H100s training or fine-tuning Llama, Mistral, Mixtral, or Nemotron-architecture models, with a need for alignment beyond pretrain. For research with novel architectures, the configuration overhead can be heavier than torchtune or a hand-rolled Megatron config.

NeMo containers ship as part of the NVIDIA NGC catalogue and the NeMo Microservices product. For air-gapped or sovereign deployments, the same containers run unchanged on any CUDA-compatible cluster — Yobitel's GPU Cloud included.

Pitfalls#

Hydra config debugging has a learning curve — the composition order matters, and override syntax is unforgiving.
NeMo checkpoints are Megatron-format; conversion to HuggingFace for serving is a documented but separate step.
Lightning's autocast and NeMo's mixed-precision configs can collide — follow the NeMo recipe rather than mixing patterns.
Container sizes are large (10+ GB) due to the breadth of bundled libraries.

Software#

github.com/NVIDIA/NeMo — main repository, Apache 2.0 licensed.
NeMo Aligner — alignment-specific extensions.
NeMo Curator — data-curation toolkit.
NGC NeMo containers — pre-built, optimised for DGX and HGX systems.
NeMo Microservices — NVIDIA-managed deployment surface for enterprise.

References

NeMo Framework documentation · NVIDIA
NeMo on GitHub · GitHub (NVIDIA)
Nemotron technical report · arXiv (NVIDIA, 2024)

Overview#

What NeMo Provides Beyond Megatron#

Hydra-based configs with composable sub-configs — swap parallelism, optimiser, or dataset by changing one yaml stanza.

Production data pipeline (NeMo Curator) — deduplication, quality filtering, language identification at trillion-token scale.

Alignment recipes — SFT, DPO, RLHF (with NeMo Aligner), reward-model training.

PEFT — LoRA, IA3, P-Tuning v2, prompt tuning, all swappable via config.

Multimodal — vision-language (CLIP, NeVA), text-to-image (Stable Diffusion derivatives).

Speech — Conformer, FastConformer ASR; FastPitch, RAD-TTS, P-Flow TTS.

Evaluation harness covering lm-evaluation-harness, MT-Bench, and benchmark suites.

Mechanism#

When to Use#

Pitfalls#

Hydra config debugging has a learning curve — the composition order matters, and override syntax is unforgiving.

NeMo checkpoints are Megatron-format; conversion to HuggingFace for serving is a documented but separate step.

Lightning's autocast and NeMo's mixed-precision configs can collide — follow the NeMo recipe rather than mixing patterns.

Container sizes are large (10+ GB) due to the breadth of bundled libraries.

NVIDIA NeMo Framework

Overview#

What NeMo Provides Beyond Megatron#

Mechanism#

When to Use#

Pitfalls#

Software#

References

Browse all entries

Deploy on Yobitel

NVIDIA NeMo Framework

Overview#

What NeMo Provides Beyond Megatron#

Mechanism#

When to Use#

Pitfalls#

Software#

References

Browse all entries

Deploy on Yobitel