Hugging Face PEFT

TL;DR

PEFT (huggingface/peft) is the de facto reference implementation of every major parameter-efficient fine-tuning technique, integrated tightly with Transformers, Accelerate, and TRL.
Supports LoRA, QLoRA, DoRA, AdaLoRA, IA³, Llama-Adapter, prompt tuning, prefix tuning, P-tuning, and several others under a unified `PeftModel` wrapper.
Apache 2.0 licensed; the underlying primitives used by Axolotl, Unsloth, LLaMA-Factory, TRL, and most internal stacks at hyperscalers.
First-class support for adapter merging, multi-adapter inference, adapter hubs, and quantised base models via bitsandbytes.

What PEFT Provides#

PEFT is a thin, focused library that wraps a Hugging Face Transformers model with the plumbing required to fine-tune a small subset of its parameters. The user supplies a base model, a `PeftConfig` describing which technique and which target modules, and PEFT returns a `PeftModel` whose `state_dict` contains only the adapter parameters.

The library handles parameter freezing, gradient routing, adapter loading and saving, merge-and-unload to fold adapters into the base, multi-adapter inference, and compatibility with the broader Hugging Face training stack — Transformers Trainer, Accelerate for distributed training, and TRL for RLHF/DPO post-training.

Supported Techniques#

Technique	Type	Headline use case
LoRA	Reparameterisation	Default PEFT recipe
QLoRA	Reparameterisation + 4-bit base	Single-GPU large-model fine-tunes
DoRA	Weight decomposition + LoRA	Closing LoRA quality gap
AdaLoRA	Adaptive-rank LoRA	Rank budget allocation
IA³	Activation rescaling	Tiny adapters (<0.1% params)
Llama-Adapter	Prompt insertion	Multimodal Llama adaptation
Prompt tuning	Soft prompts	Small task heads
Prefix tuning	Soft prompts in KV cache	Generation tasks
P-tuning v2	Deep soft prompts	Smaller LLMs

Canonical Usage#

python

from peft import LoraConfig, get_peft_model, TaskType
from transformers import AutoModelForCausalLM

base = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Meta-Llama-3.1-8B",
    torch_dtype="bfloat16",
    device_map="auto",
)

lora_config = LoraConfig(
    task_type=TaskType.CAUSAL_LM,
    r=16,
    lora_alpha=32,
    lora_dropout=0.05,
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj",
                    "gate_proj", "up_proj", "down_proj"],
)

model = get_peft_model(base, lora_config)
model.print_trainable_parameters()
# trainable params: 41M (0.5% of 8B base)

Integrations#

Transformers Trainer — PEFT models drop into the standard training loop with no code changes.
TRL — SFTTrainer, DPOTrainer, GRPOTrainer all accept PeftConfig directly.
bitsandbytes — 4-bit and 8-bit quantised bases are loaded via `BitsAndBytesConfig` and supported transparently.
Accelerate — distributed and mixed-precision training works without adapter-specific code.
vLLM and TGI — multi-LoRA serving consumes PEFT-format adapter directories directly.
Hugging Face Hub — adapter weights publish as small repos that reference the base by ID.

For production runs, always pin both the PEFT version and the Transformers version. The two libraries co-evolve quickly and adapter checkpoints occasionally need format migrations between minor versions.

When PEFT is the Right Choice#

PEFT is the right starting point whenever you are inside the Hugging Face ecosystem and need PEFT primitives. Higher-level frameworks (Axolotl, Unsloth, LLaMA-Factory) build on top of PEFT and add YAML-driven configuration, optimised kernels, or curated recipes — pick those when you want opinionated workflows, pick PEFT directly when you want maximum control and minimal abstraction.

References

PEFT — State-of-the-art Parameter-Efficient Fine-Tuning · GitHub
PEFT documentation · Hugging Face
TRL — Transformer Reinforcement Learning · GitHub

What PEFT Provides#

Supported Techniques#

Technique	Type	Headline use case
LoRA	Reparameterisation	Default PEFT recipe
QLoRA	Reparameterisation + 4-bit base	Single-GPU large-model fine-tunes
DoRA	Weight decomposition + LoRA	Closing LoRA quality gap
AdaLoRA	Adaptive-rank LoRA	Rank budget allocation
IA³	Activation rescaling	Tiny adapters (<0.1% params)
Llama-Adapter	Prompt insertion	Multimodal Llama adaptation
Prompt tuning	Soft prompts	Small task heads
Prefix tuning	Soft prompts in KV cache	Generation tasks
P-tuning v2	Deep soft prompts	Smaller LLMs

Canonical Usage#

python

from peft import LoraConfig, get_peft_model, TaskType
from transformers import AutoModelForCausalLM

base = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Meta-Llama-3.1-8B",
    torch_dtype="bfloat16",
    device_map="auto",
)

lora_config = LoraConfig(
    task_type=TaskType.CAUSAL_LM,
    r=16,
    lora_alpha=32,
    lora_dropout=0.05,
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj",
                    "gate_proj", "up_proj", "down_proj"],
)

model = get_peft_model(base, lora_config)
model.print_trainable_parameters()
# trainable params: 41M (0.5% of 8B base)

Integrations#

Transformers Trainer — PEFT models drop into the standard training loop with no code changes.

TRL — SFTTrainer, DPOTrainer, GRPOTrainer all accept PeftConfig directly.

bitsandbytes — 4-bit and 8-bit quantised bases are loaded via `BitsAndBytesConfig` and supported transparently.

Accelerate — distributed and mixed-precision training works without adapter-specific code.

vLLM and TGI — multi-LoRA serving consumes PEFT-format adapter directories directly.

Hugging Face Hub — adapter weights publish as small repos that reference the base by ID.

When PEFT is the Right Choice#

Hugging Face PEFT

What PEFT Provides#

Supported Techniques#

Canonical Usage#

Integrations#

When PEFT is the Right Choice#

References

Browse all entries

Deploy on Yobitel

Hugging Face PEFT

What PEFT Provides#

Supported Techniques#

Canonical Usage#

Integrations#

When PEFT is the Right Choice#

References

Browse all entries

Deploy on Yobitel