Hyperparameter Search with Optuna

TL;DR

Open-source Python framework from Preferred Networks (Akiba et al., 2019, arXiv:1907.10902), Apache 2.0.
Defines hyperparameter search as a Python function (`objective(trial)`) rather than a config file; samplers (TPE, CMA-ES, random) and pruners (Hyperband, Median) are composed at runtime.
Distributed via a shared RDBMS store; integrates with PyTorch Lightning, HuggingFace Trainer, MLflow, and Weights & Biases.

Overview#

Optuna is the most widely used pure-Python HPO framework as of 2026. It diverges from earlier tools like Hyperopt and SMAC in two ways: it uses a define-by-run API (the search space is constructed inside the objective function by calling `trial.suggest_*`, not declared upfront), and it ships a portfolio of samplers and pruners that can be swapped without changing user code.

The TPE (Tree-structured Parzen Estimator) sampler is the default for most users — it works well for small-to-medium search spaces (≤ ~50 dimensions) and integrates pruning naturally. For larger or more structured spaces, GPSampler (Gaussian Process) or CMA-ES are alternatives.

Mechanism#

Each trial is one execution of the objective function. The sampler observes (params, value) pairs from completed trials and proposes the next set of parameters; the pruner watches intermediate values reported during the trial and can terminate underperforming trials early. Trials can run concurrently, coordinating through a shared storage backend — SQLite for single-node, PostgreSQL or MySQL for distributed.

python

import optuna

def objective(trial: optuna.Trial) -> float:
    lr = trial.suggest_float("lr", 1e-5, 1e-3, log=True)
    batch_size = trial.suggest_categorical("batch_size", [16, 32, 64])
    weight_decay = trial.suggest_float("weight_decay", 1e-6, 1e-2, log=True)
    return train_one_run(lr=lr, batch_size=batch_size, weight_decay=weight_decay)

study = optuna.create_study(
    direction="maximize",
    sampler=optuna.samplers.TPESampler(),
    pruner=optuna.pruners.HyperbandPruner(),
    storage="postgresql://optuna@db/optuna",
    study_name="llama-sft-search",
    load_if_exists=True,
)
study.optimize(objective, n_trials=200, n_jobs=8)

Performance Characteristics#

TPE is sample-efficient for ≤ 50 dimensions; degrades for very high-dimensional spaces.
Hyperband pruning typically saves 50-80 % of total compute compared to no pruning, with no loss in best-trial quality for well-behaved learning curves.
Distributed parallel trials scale linearly until the storage backend or shared GPU pool becomes the bottleneck.

When to Use#

Use Optuna for hyperparameter search on individual model training runs — fine-tuning learning rate, batch size, LoRA rank, weight decay; tuning data-mixing weights; sweeping over architectural choices. For RL-style continuous optimisation, Ray Tune's RLLib-aware schedulers may be a better fit; for large-scale industrial HPO, Vizier and similar managed services compete on the same ground.

Pitfalls#

Pruners need intermediate values reported via `trial.report()` — silent objective functions skip the pruning benefit.
Categorical hyperparameters break TPE assumptions slightly; large categorical spaces work better with the random or CMA-ES samplers.
Optuna stores all trial parameters and values in the backend — runaway studies can balloon storage size.
Distributed studies must share the storage URL; SQLite is single-node-only.

Software#

github.com/optuna/optuna — main repository, Apache 2.0.
Optuna Dashboard for live study visualisation.
Integrations: PyTorch Lightning, HuggingFace Trainer (`OptunaSearch`), MLflow, Weights & Biases, scikit-learn.
Optuna Sweeper plugin for Hydra.

References

Optuna: A Next-generation Hyperparameter Optimization Framework · arXiv (Akiba et al., 2019)
Optuna documentation · Optuna
Optuna on GitHub · GitHub

Overview#

Mechanism#

python

import optuna

def objective(trial: optuna.Trial) -> float:
    lr = trial.suggest_float("lr", 1e-5, 1e-3, log=True)
    batch_size = trial.suggest_categorical("batch_size", [16, 32, 64])
    weight_decay = trial.suggest_float("weight_decay", 1e-6, 1e-2, log=True)
    return train_one_run(lr=lr, batch_size=batch_size, weight_decay=weight_decay)

study = optuna.create_study(
    direction="maximize",
    sampler=optuna.samplers.TPESampler(),
    pruner=optuna.pruners.HyperbandPruner(),
    storage="postgresql://optuna@db/optuna",
    study_name="llama-sft-search",
    load_if_exists=True,
)
study.optimize(objective, n_trials=200, n_jobs=8)

Performance Characteristics#

TPE is sample-efficient for ≤ 50 dimensions; degrades for very high-dimensional spaces.

Hyperband pruning typically saves 50-80 % of total compute compared to no pruning, with no loss in best-trial quality for well-behaved learning curves.

Distributed parallel trials scale linearly until the storage backend or shared GPU pool becomes the bottleneck.

When to Use#

Pitfalls#

Pruners need intermediate values reported via `trial.report()` — silent objective functions skip the pruning benefit.

Categorical hyperparameters break TPE assumptions slightly; large categorical spaces work better with the random or CMA-ES samplers.

Optuna stores all trial parameters and values in the backend — runaway studies can balloon storage size.

Distributed studies must share the storage URL; SQLite is single-node-only.

Hyperparameter Search with Optuna

Overview#

Mechanism#

Performance Characteristics#

When to Use#

Pitfalls#

Software#

References

Browse all entries

Deploy on Yobitel

Hyperparameter Search with Optuna

Overview#

Mechanism#

Performance Characteristics#

When to Use#

Pitfalls#

Software#

References

Browse all entries

Deploy on Yobitel