TL;DR
- Open-source framework for LLM, speech, and multimodal model training, fine-tuning, and alignment, built on Megatron-LM + Apex + PyTorch Lightning + Hydra.
- Provides end-to-end recipes — data prep, pretrain, SFT, LoRA, DPO, RLHF, evaluation — rather than primitives alone.
- Used as NVIDIA's reference training stack for partner-led foundation-model projects and for the Nemotron family of NVIDIA-trained open models.
Overview#
NeMo Framework wraps Megatron-LM's parallelism and kernel primitives in a structured project layout, configuration system (Hydra), and recipe catalogue. Where Megatron-LM is 'here are the building blocks', NeMo is 'here is a working pretraining run for Llama-style 70B on 256 H100 — modify the config to taste'.
It also extends Megatron in directions Megatron itself does not cover: alignment (SFT, DPO, RLHF), parameter-efficient fine-tuning (LoRA, P-Tuning), evaluation harnesses, multimodal architectures (NeMo Multimodal), and speech (NeMo Speech, the codebase behind NVIDIA's Riva ASR/TTS models).
What NeMo Provides Beyond Megatron#
- Hydra-based configs with composable sub-configs — swap parallelism, optimiser, or dataset by changing one yaml stanza.
- Production data pipeline (NeMo Curator) — deduplication, quality filtering, language identification at trillion-token scale.
- Alignment recipes — SFT, DPO, RLHF (with NeMo Aligner), reward-model training.
- PEFT — LoRA, IA3, P-Tuning v2, prompt tuning, all swappable via config.
- Multimodal — vision-language (CLIP, NeVA), text-to-image (Stable Diffusion derivatives).
- Speech — Conformer, FastConformer ASR; FastPitch, RAD-TTS, P-Flow TTS.
- Evaluation harness covering lm-evaluation-harness, MT-Bench, and benchmark suites.
Mechanism#
Under the hood NeMo is Megatron Core for parallelism plus Apex for fused optimisers plus Transformer Engine for FP8 plus PyTorch Lightning for the training loop scaffolding. The Lightning-based trainer means logging, checkpointing, EMA, gradient clipping, and similar plumbing come for free — and you can integrate with any Lightning-compatible logger (Weights & Biases, MLflow, TensorBoard).
Recipes ship as Python entry-points: `python /opt/NeMo/examples/nlp/language_modeling/megatron_gpt_pretraining.py --config-path=conf --config-name=megatron_llama3_70b` runs a tested config, with overrides accepted on the command line.
When to Use#
Use NeMo when you want Megatron-LM-class parallelism and FP8 performance without writing Megatron infrastructure code yourself. The sweet spot is teams running 32-1,024 H100s training or fine-tuning Llama, Mistral, Mixtral, or Nemotron-architecture models, with a need for alignment beyond pretrain. For research with novel architectures, the configuration overhead can be heavier than torchtune or a hand-rolled Megatron config.
NeMo containers ship as part of the NVIDIA NGC catalogue and the NeMo Microservices product. For air-gapped or sovereign deployments, the same containers run unchanged on any CUDA-compatible cluster — Yobitel's GPU Cloud included.
Pitfalls#
- Hydra config debugging has a learning curve — the composition order matters, and override syntax is unforgiving.
- NeMo checkpoints are Megatron-format; conversion to HuggingFace for serving is a documented but separate step.
- Lightning's autocast and NeMo's mixed-precision configs can collide — follow the NeMo recipe rather than mixing patterns.
- Container sizes are large (10+ GB) due to the breadth of bundled libraries.
Software#
- github.com/NVIDIA/NeMo — main repository, Apache 2.0 licensed.
- NeMo Aligner — alignment-specific extensions.
- NeMo Curator — data-curation toolkit.
- NGC NeMo containers — pre-built, optimised for DGX and HGX systems.
- NeMo Microservices — NVIDIA-managed deployment surface for enterprise.
References
- NeMo Framework documentation · NVIDIA
- NeMo on GitHub · GitHub (NVIDIA)
- Nemotron technical report · arXiv (NVIDIA, 2024)