Thesis - Hierarchical IVF with Probabilistic Graph Construction for k-ANNS
Designed a hierarchical generalization of the Inverted File Index (IVF) for approximate nearest neighbor search, independently deriving a structure equivalent to Hierarchical IVF (HIVF) and formalizing it within a probabilistic optimization framework.
In this project, I developed a hierarchical extension of the Inverted File Index (IVF) proposed by Jeff Johnson et al., integrating it with approximate nearest neighbor graph construction and greedy search. The goal was to reduce the O(n^2) cost of full k-NN graph construction while preserving cross-cluster navigability. Starting from partition-based indexing, I constructed a recursive hierarchical clustering scheme that generalizes flat IVF into a multi-level structure. At the time of development, I was unaware of existing Hierarchical IVF variants; the structure was derived independently through analysis of cluster assignment propagation and search-space expansion in high dimensions. The key contribution was introducing a probabilistic cluster-selection mechanism that replaces hard assignments with accumulated likelihood thresholds, allowing controlled exploration of multiple branches while bounding candidate growth. I further extended this into a Bayesian hierarchical formulation, propagating conditional probabilities across layers to avoid exponential cluster expansion in deep hierarchies. The resulting method reduces graph build complexity by restricting candidate neighborhoods via hierarchical partitioning while preserving inter-cluster escape paths through probabilistic multi-branch selection. It provides a tunable precision–efficiency trade-off through probability thresholds and effectively bridges partition-based search (IVF) with graph-based approximate nearest neighbor methods such as HNSW. The work emphasizes algorithmic structure, asymptotic complexity, probabilistic modeling of cluster assignment, and memory–compute trade-offs in large-scale similarity search.
Academic Context