TL;DR
- PEFT (huggingface/peft) is the de facto reference implementation of every major parameter-efficient fine-tuning technique, integrated tightly with Transformers, Accelerate, and TRL.
- Supports LoRA, QLoRA, DoRA, AdaLoRA, IA³, Llama-Adapter, prompt tuning, prefix tuning, P-tuning, and several others under a unified `PeftModel` wrapper.
- Apache 2.0 licensed; the underlying primitives used by Axolotl, Unsloth, LLaMA-Factory, TRL, and most internal stacks at hyperscalers.
- First-class support for adapter merging, multi-adapter inference, adapter hubs, and quantised base models via bitsandbytes.
What PEFT Provides#
PEFT is a thin, focused library that wraps a Hugging Face Transformers model with the plumbing required to fine-tune a small subset of its parameters. The user supplies a base model, a `PeftConfig` describing which technique and which target modules, and PEFT returns a `PeftModel` whose `state_dict` contains only the adapter parameters.
The library handles parameter freezing, gradient routing, adapter loading and saving, merge-and-unload to fold adapters into the base, multi-adapter inference, and compatibility with the broader Hugging Face training stack — Transformers Trainer, Accelerate for distributed training, and TRL for RLHF/DPO post-training.
Supported Techniques#
| Technique | Type | Headline use case |
|---|---|---|
| LoRA | Reparameterisation | Default PEFT recipe |
| QLoRA | Reparameterisation + 4-bit base | Single-GPU large-model fine-tunes |
| DoRA | Weight decomposition + LoRA | Closing LoRA quality gap |
| AdaLoRA | Adaptive-rank LoRA | Rank budget allocation |
| IA³ | Activation rescaling | Tiny adapters (<0.1% params) |
| Llama-Adapter | Prompt insertion | Multimodal Llama adaptation |
| Prompt tuning | Soft prompts | Small task heads |
| Prefix tuning | Soft prompts in KV cache | Generation tasks |
| P-tuning v2 | Deep soft prompts | Smaller LLMs |
Canonical Usage#
from peft import LoraConfig, get_peft_model, TaskType
from transformers import AutoModelForCausalLM
base = AutoModelForCausalLM.from_pretrained(
"meta-llama/Meta-Llama-3.1-8B",
torch_dtype="bfloat16",
device_map="auto",
)
lora_config = LoraConfig(
task_type=TaskType.CAUSAL_LM,
r=16,
lora_alpha=32,
lora_dropout=0.05,
target_modules=["q_proj", "k_proj", "v_proj", "o_proj",
"gate_proj", "up_proj", "down_proj"],
)
model = get_peft_model(base, lora_config)
model.print_trainable_parameters()
# trainable params: 41M (0.5% of 8B base)Integrations#
- Transformers Trainer — PEFT models drop into the standard training loop with no code changes.
- TRL — SFTTrainer, DPOTrainer, GRPOTrainer all accept PeftConfig directly.
- bitsandbytes — 4-bit and 8-bit quantised bases are loaded via `BitsAndBytesConfig` and supported transparently.
- Accelerate — distributed and mixed-precision training works without adapter-specific code.
- vLLM and TGI — multi-LoRA serving consumes PEFT-format adapter directories directly.
- Hugging Face Hub — adapter weights publish as small repos that reference the base by ID.
For production runs, always pin both the PEFT version and the Transformers version. The two libraries co-evolve quickly and adapter checkpoints occasionally need format migrations between minor versions.
When PEFT is the Right Choice#
PEFT is the right starting point whenever you are inside the Hugging Face ecosystem and need PEFT primitives. Higher-level frameworks (Axolotl, Unsloth, LLaMA-Factory) build on top of PEFT and add YAML-driven configuration, optimised kernels, or curated recipes — pick those when you want opinionated workflows, pick PEFT directly when you want maximum control and minimal abstraction.
References
- PEFT — State-of-the-art Parameter-Efficient Fine-Tuning · GitHub
- PEFT documentation · Hugging Face
- TRL — Transformer Reinforcement Learning · GitHub