ScaNN

TL;DR

ScaNN — Scalable Nearest Neighbors — was released by Google Research in 2020 alongside the paper 'Accelerating Large-Scale Inference with Anisotropic Vector Quantization' (arXiv:2007.00094).
Core contribution is anisotropic quantisation: a quantisation loss that weights errors along the query-relevant direction more heavily, so reconstruction preserves the inner product even when raw distance is approximate.
Combined with a tree-based partitioner and SIMD-optimised scoring, ScaNN regularly tops the ann-benchmarks leaderboard at fixed memory budgets.
Open-source Apache 2.0, available as a Python package and integrated into Vertex AI Matching Engine.

Why Anisotropic Quantisation#

Standard Product Quantisation minimises mean squared error when reconstructing the original vectors. ScaNN's insight is that for maximum-inner-product search (MIPS), reconstruction error in the direction of the query matters far more than error perpendicular to it. The same quantisation budget produces much better MIPS recall if it is biased to preserve the parallel component.

The result is a quantisation loss function with a parameter that controls how much the parallel error is weighted relative to the perpendicular error. Tuned correctly, it pushes recall up several points at fixed bytes-per-vector compared to vanilla PQ.

Architecture#

All three stages are tuned together. ScaNN ships configuration helpers that pick partitioner shape, codebook size, and rescoring depth based on the corpus size and the recall target.

Partitioning — a tree of learned hyperplanes splits the corpus into leaves of typically 100-1000 vectors each.
Scoring — within each leaf, asymmetric distance computation using anisotropically quantised codes and SIMD-friendly lookups.
Rescoring — a small final pass over top candidates with full-precision vectors to clean up the ranking.

Performance Profile#

On the standard ann-benchmarks (GLOVE, SIFT, GIST, DEEP) ScaNN frequently dominates the recall-per-memory curve at high recall settings — it gets 95%+ recall at a fraction of the bytes-per-vector that HNSW needs to hit the same number. For pure latency at lower recall, HNSW often matches or beats it. The pragmatic distinction: ScaNN is best when memory is the constraint, HNSW when latency is.

If you are choosing between ScaNN and HNSW, plot the recall vs memory curve on your own corpus. Public benchmarks are useful but the crossover point shifts with vector dimensionality and distribution.

Where You Encounter ScaNN#

Beyond the open-source library, ScaNN powers Google Vertex AI Vector Search (formerly Matching Engine), Google's managed ANN service. It is also the backend for several Google Cloud product retrieval and recommendation features. Outside Google, adoption is narrower than HNSW because the tuning surface is more complex and the API is Python-only.

References

Accelerating Large-Scale Inference with Anisotropic Vector Quantization · arXiv (Guo et al., 2020)
google-research/google-research/scann · GitHub
Vertex AI Vector Search Documentation · Google Cloud Docs

Why Anisotropic Quantisation#

Architecture#

All three stages are tuned together. ScaNN ships configuration helpers that pick partitioner shape, codebook size, and rescoring depth based on the corpus size and the recall target.

Partitioning — a tree of learned hyperplanes splits the corpus into leaves of typically 100-1000 vectors each.

Scoring — within each leaf, asymmetric distance computation using anisotropically quantised codes and SIMD-friendly lookups.

Rescoring — a small final pass over top candidates with full-precision vectors to clean up the ranking.

Performance Profile#

Where You Encounter ScaNN#

ScaNN

Why Anisotropic Quantisation#

Architecture#

Performance Profile#

Where You Encounter ScaNN#

References

Browse all entries

Deploy on Yobitel

ScaNN

Why Anisotropic Quantisation#

Architecture#

Performance Profile#

Where You Encounter ScaNN#

References

Browse all entries

Deploy on Yobitel