TL;DR
- IVF partitions the vector space into nlist Voronoi cells via k-means clustering. Each vector is stored in the inverted list of its nearest centroid.
- At query time, the index computes distances from the query to all centroids, picks the nprobe nearest cells, and does exact search inside those cells only.
- Trade-off is straightforward — higher nprobe finds more true neighbours but costs proportionally more. nlist is set at build time, typically as sqrt(N) to 4 x sqrt(N).
- Almost always combined with Product Quantisation (IVFPQ) for memory compression — the standard billion-scale recipe in FAISS.
The Coarse Quantiser#
IVF builds in two passes. First, a sample of vectors (or all of them) is clustered into nlist groups by k-means; the resulting centroids form a 'coarse quantiser'. Second, every vector in the corpus is assigned to the inverted list of its nearest centroid. The index is conceptually a map: centroid id to the list of vectors whose nearest centroid this is.
At query time the same coarse quantiser is used to find the nprobe nearest centroids to the query vector. Only the vectors in those nprobe inverted lists are scored. If nprobe is too small the true nearest neighbour may live in a cell that was never visited; if nprobe is too large the search degenerates toward brute force.
IVF + Product Quantisation#
Storing raw vectors in the inverted lists is fast but memory-hungry. Product Quantisation (PQ, Jegou et al., 2011) compresses each vector by splitting it into M sub-vectors and replacing each with the id of its nearest centroid in a learned codebook. A 768-dimensional float32 vector compresses to typically 32-96 bytes, with controlled loss.
IVFPQ — IVF for coarse partitioning, PQ for in-list compression — is the standard FAISS recipe for billion-scale search on a single machine. Variants include IVF_HNSW (use HNSW as the coarse quantiser to speed up centroid search), OPQ (rotate the space before PQ to balance sub-vector variance), and ScaNN's anisotropic loss correction.
IVFPQ trades exactness for footprint. Recall@10 of 70-90% at billion scale is the typical operating point; if you need >95% recall, plan to either raise nprobe substantially or re-rank with full-precision distances.
Tuning#
| Parameter | Effect of increasing | Typical value |
|---|---|---|
| nlist | Smaller cells, faster search per probe, slower training | sqrt(N) to 4 x sqrt(N) |
| nprobe | Higher recall, linearly higher latency | 1-128, tuned to recall target |
| M (PQ subquantisers) | Smaller compressed code, lower recall | 16, 32, 64, 96 |
| nbits (per PQ code) | Larger codebook, more recall, more memory | 8 (standard) or 4 |
IVF vs HNSW#
IVF wins when you have billions of vectors, need to compress aggressively, and are willing to accept 70-90% recall. HNSW wins when you have tens of millions of vectors, can keep them in RAM, and need 95%+ recall at sub-millisecond latency. The break-even shifts as hardware changes; in 2026 on H100/H200-class machines with 80-141 GB HBM, HNSW handles hundreds of millions of vectors comfortably, pushing the IVF threshold higher than it used to be.
References
- Product Quantization for Nearest Neighbor Search · Jegou, Douze, Schmid (PAMI 2011)
- Billion-scale similarity search with GPUs · arXiv (Johnson, Douze, Jegou — FAISS, 2017)
- FAISS Index Factory Reference · FAISS Wiki