Dr. McInnes is a research mathematician and Data Scientist at the Tutte Institute for Mathematics and Computing in Ottawa, Ontario. He’s well-known for his and collaborators’ creation of the Accelerated HDBSCAN* algorithm, and UMAP – a manifold learning dimension reduction technique.
Finding nearest neighbors in a dataset is an important fundamental problem for many different machine learning tasks, particularly in unsupervised learning. The problem becomes particularly challenging for higher dimensional data. We’ll look at some of the classical tree based techniques, and look at some of the difficulties that occur. Next we’ll explore Nearest Neighbor Descent and related graph based algorithms, working from very simple intuitions to efficient implementations. The approaches provide a powerful and flexible framework for (approximate) nearest neighbor searching, and, as we will demonstrate, significantly outperform tree based approaches on a number of real world datasets.