← Back

Scalable Approximate k-Nearest Neighbor Search (k-ANNS)

Team project investigating computational and statistical trade-offs in high-dimensional approximate k-NN search. Implemented and benchmarked partition-based indexing (k-d trees and Random Projection trees) and candidate-selection algorithms, analyzing the recall–query time frontier across datasets.

In this team project, we studied scalable high-dimensional similarity search with a focus on algorithmic efficiency and statistical performance trade-offs. We implemented tree-based indexing structures (k-d trees and Random Projection trees) combined with three candidate-selection strategies: Lookup Search, Voting Search, and the Natural Classifier proposed by Ville Hyvönen. All methods were reproduced and benchmarked on Fashion-MNIST and GloVe. We evaluated performance through systematic grid search over forest size, leaf size, and threshold parameters, analyzing the recall–query time frontier under varying regimes. Results showed that the Natural Classifier dominates Voting Search in high-recall regimes, while lower-recall performance is highly parameter-sensitive. The study highlights the importance of joint optimization of indexing structure and selection strategy when operating under computational constraints.

Academic Context

MSc Software Design at IT University of Copenhagen

2024

Core Competences

Engineering Expertise

Probabilistic Graph Algorithms Algorithms and Optimization

Mathematical Foundation

Statistical Modeling

Implementation Stack

Python