MLflow — Experiment and Model Tracking

TL;DR

Open-source platform from Databricks (initially 2018), Apache 2.0. Tracks experiments, packages projects, registers models, and serves them.
Self-hostable on commodity infrastructure (server + RDBMS + object store) — popular default for teams that need on-prem ML observability.
Four products: Tracking (metrics), Projects (reproducible runs), Models (packaging), Model Registry (versioning + stage transitions).

Overview#

MLflow is the open-source counterpart to Weights & Biases. It was created by Databricks in 2018 and is now governed under the Linux Foundation. The four MLflow products handle different stages of the ML lifecycle: Tracking for metric logging, Projects for reproducible run packaging, Models for model serialisation, and Model Registry for version control and stage transitions.

Self-hosting is the headline feature. An MLflow Tracking Server is a single Python process backed by an RDBMS (Postgres, MySQL, SQLite) for metadata and an artefact store (S3, GCS, Azure Blob, local filesystem) for files. This makes it trivial to deploy in air-gapped or sovereign environments.

What MLflow Provides#

Tracking — `mlflow.log_metric`, `log_param`, `log_artifact` with a web UI for browsing runs.
Autologging — `mlflow.autolog()` instruments scikit-learn, PyTorch, TensorFlow, XGBoost, transformers automatically.
Projects — `MLproject` files declare reproducible runs (entrypoints, env, parameters).
Models — language-agnostic model packaging with flavours (PyFunc, sklearn, ONNX, etc.).
Model Registry — versioned model store with stages (None/Staging/Production/Archived) and stage transitions.

Mechanism#

The MLflow client logs to a tracking URI — either an HTTP server or a local file path. Metric updates are buffered and sent in batches; artefacts upload directly to the configured artefact store. The web UI queries the RDBMS for runs and proxies the artefact store for downloads.

Model Registry adds an approval workflow on top of the artefact store. Stage transitions (Staging → Production) are auditable events; downstream services (a serving endpoint, a CI job) can poll the registry for the current Production version.

When to Use#

Use MLflow when self-hosting is a hard requirement (regulated industry, sovereign cloud, air-gapped lab), when you want a vendor-neutral OSS option, or when you already run Databricks (which integrates MLflow tightly). For teams without self-hosting constraints, W&B's polished UI and managed service often win — but MLflow is the open backstop.

For Yobitel-hosted sovereign training deployments, MLflow Tracking Server + S3-compatible object store + PostgreSQL is a standard pattern. It runs in one Helm chart and keeps experiment metadata inside the customer's data boundary.

Pitfalls#

UI is more utilitarian than W&B's; advanced visualisations (gradient histograms, custom panels) are less polished.
Scaling the Tracking Server to thousands of concurrent runs requires care — Postgres tuning, artefact-store sizing.
Autologging captures a lot by default — review what is logged for sensitive parameters.
Model Registry stage names (`Production`, `Staging`) are deprecated in MLflow 2.x in favour of aliases — migrate.

Software#

github.com/mlflow/mlflow — main repository, Apache 2.0.
MLflow Tracking Server — typically deployed via Helm or Docker.
Framework integrations: PyTorch, Lightning, HuggingFace, TensorFlow, scikit-learn, XGBoost, statsmodels.
Databricks-hosted MLflow (managed) for teams already on Databricks.

References

MLflow documentation · MLflow
MLflow on GitHub · GitHub
Accelerating the Machine Learning Lifecycle with MLflow · Databricks

Overview#

What MLflow Provides#

Tracking — `mlflow.log_metric`, `log_param`, `log_artifact` with a web UI for browsing runs.

Autologging — `mlflow.autolog()` instruments scikit-learn, PyTorch, TensorFlow, XGBoost, transformers automatically.

Projects — `MLproject` files declare reproducible runs (entrypoints, env, parameters).

Models — language-agnostic model packaging with flavours (PyFunc, sklearn, ONNX, etc.).

Model Registry — versioned model store with stages (None/Staging/Production/Archived) and stage transitions.

Mechanism#

When to Use#

Pitfalls#

UI is more utilitarian than W&B's; advanced visualisations (gradient histograms, custom panels) are less polished.

Scaling the Tracking Server to thousands of concurrent runs requires care — Postgres tuning, artefact-store sizing.

Autologging captures a lot by default — review what is logged for sensitive parameters.

Model Registry stage names (`Production`, `Staging`) are deprecated in MLflow 2.x in favour of aliases — migrate.

MLflow — Experiment and Model Tracking

Overview#

What MLflow Provides#

Mechanism#

When to Use#

Pitfalls#

Software#

References

Browse all entries

Deploy on Yobitel

MLflow — Experiment and Model Tracking

Overview#

What MLflow Provides#

Mechanism#

When to Use#

Pitfalls#

Software#

References

Browse all entries

Deploy on Yobitel