TL;DR
- Open-source Python framework from Preferred Networks (Akiba et al., 2019, arXiv:1907.10902), Apache 2.0.
- Defines hyperparameter search as a Python function (`objective(trial)`) rather than a config file; samplers (TPE, CMA-ES, random) and pruners (Hyperband, Median) are composed at runtime.
- Distributed via a shared RDBMS store; integrates with PyTorch Lightning, HuggingFace Trainer, MLflow, and Weights & Biases.
Overview#
Optuna is the most widely used pure-Python HPO framework as of 2026. It diverges from earlier tools like Hyperopt and SMAC in two ways: it uses a define-by-run API (the search space is constructed inside the objective function by calling `trial.suggest_*`, not declared upfront), and it ships a portfolio of samplers and pruners that can be swapped without changing user code.
The TPE (Tree-structured Parzen Estimator) sampler is the default for most users — it works well for small-to-medium search spaces (≤ ~50 dimensions) and integrates pruning naturally. For larger or more structured spaces, GPSampler (Gaussian Process) or CMA-ES are alternatives.
Mechanism#
Each trial is one execution of the objective function. The sampler observes (params, value) pairs from completed trials and proposes the next set of parameters; the pruner watches intermediate values reported during the trial and can terminate underperforming trials early. Trials can run concurrently, coordinating through a shared storage backend — SQLite for single-node, PostgreSQL or MySQL for distributed.
import optuna
def objective(trial: optuna.Trial) -> float:
lr = trial.suggest_float("lr", 1e-5, 1e-3, log=True)
batch_size = trial.suggest_categorical("batch_size", [16, 32, 64])
weight_decay = trial.suggest_float("weight_decay", 1e-6, 1e-2, log=True)
return train_one_run(lr=lr, batch_size=batch_size, weight_decay=weight_decay)
study = optuna.create_study(
direction="maximize",
sampler=optuna.samplers.TPESampler(),
pruner=optuna.pruners.HyperbandPruner(),
storage="postgresql://optuna@db/optuna",
study_name="llama-sft-search",
load_if_exists=True,
)
study.optimize(objective, n_trials=200, n_jobs=8)Performance Characteristics#
- TPE is sample-efficient for ≤ 50 dimensions; degrades for very high-dimensional spaces.
- Hyperband pruning typically saves 50-80 % of total compute compared to no pruning, with no loss in best-trial quality for well-behaved learning curves.
- Distributed parallel trials scale linearly until the storage backend or shared GPU pool becomes the bottleneck.
When to Use#
Use Optuna for hyperparameter search on individual model training runs — fine-tuning learning rate, batch size, LoRA rank, weight decay; tuning data-mixing weights; sweeping over architectural choices. For RL-style continuous optimisation, Ray Tune's RLLib-aware schedulers may be a better fit; for large-scale industrial HPO, Vizier and similar managed services compete on the same ground.
Pitfalls#
- Pruners need intermediate values reported via `trial.report()` — silent objective functions skip the pruning benefit.
- Categorical hyperparameters break TPE assumptions slightly; large categorical spaces work better with the random or CMA-ES samplers.
- Optuna stores all trial parameters and values in the backend — runaway studies can balloon storage size.
- Distributed studies must share the storage URL; SQLite is single-node-only.
Software#
- github.com/optuna/optuna — main repository, Apache 2.0.
- Optuna Dashboard for live study visualisation.
- Integrations: PyTorch Lightning, HuggingFace Trainer (`OptunaSearch`), MLflow, Weights & Biases, scikit-learn.
- Optuna Sweeper plugin for Hydra.
References
- Optuna: A Next-generation Hyperparameter Optimization Framework · arXiv (Akiba et al., 2019)
- Optuna documentation · Optuna
- Optuna on GitHub · GitHub