How To Benchmark ANN Algorithms– An Investigation Into The Performance Of Various Approximate Nearest-Neighbor Algorithms

Blog post from partner:

Approximate Nearest-Neighbors for new voter party affiliation. Credit:


The field of data science is rapidly changing as new and exciting software and hardware breakthroughs are made every single day. Given the rapidly changing landscape, it is important to take the appropriate time to understand and investigate some of the underlying technology that has shaped and will shape, the data science world. As an undergraduate data scientist, I often wish more time was spent understanding the tools at our disposal, and when they should appropriately be used. One prime example is the variety of options to choose from when picking an implementation of a Nearest-Neighbor algorithm; a type of algorithm prevalent in pattern recognition. Whilst there are a range of different types of Nearest-Neighbor algorithms I specifically want to focus on Approximate Nearest Neighbor (ANN) and the overwhelming variety of implementations available in python.

My first project with my internship at GSI Technology explored the idea of benchmarking ANN algorithms to help understand how the choice of implementation can change depending on the type and size of the dataset. This task proved challenging yet rewarding, as to thoroughly benchmark a range of ANN algorithms we would have to use a variety of datasets and a lot of computation. This would all prove to provide some valuable results (as you will see further down) in addition to a few insights and clues as to which implementations and implementation strategies might become industry standard in the future.

What Is ANN?

Before we continue its important to lay out the foundations of what ANN is and why is it used. New students to the data science field might already be familiar with ANN’s brother, kNN (k-Nearest Neighbors) as it is a standard entry point in many early machine learning classes.


Red points are grouped with the five (K) closest points.

kNN works by classifying unclassified points based on “k” number of nearby points where distance is evaluated based on a range of different formulas such as Euclidean distance, Manhattan distance (Taxicab distance), Angular distance, and many more. ANN essentially functions as a faster classifier with a slight trade-off in accuracy, utilizing techniques such as locality sensitive hashing to better balance speed and precision. This trade-off becomes especially important with datasets in higher dimensions where algorithms like kNN can slow to a grueling pace.

Within the field of ANN algorithms, there are five different types of implementations with various advantages and disadvantages. For people unfamiliar with the field here is a quick crash course on each type of implementation:


  • Brute Force; whilst not technically an ANN algorithm it provides the most intuitive solution and a baseline to evaluate all other models. It calculates the distance between all points in the datasets before sorting to find the nearest neighbor for each point. Incredibly inefficient.
  • Hashing Based, sometimes referred to as LSH (locality sensitive hashing), involves a preprocessing stage where the data is filtered into a range of hash-tables in preparation for the querying process. Upon querying the algorithm iterates back over the hash-tables retrieving all points similarly hashed and then evaluates proximity to return a list of nearest neighbors.
  • Graph-Based, which also includes tree-based implementations, starts from a group of “seeds” (randomly picked points from the dataset) and generates a series of graphs before traversing the graphs using best-first search. Through using a visited vertex parameter from each neighbor the implementation is able to narrow down the “true” nearest neighbor.
  • Partition Based, similar to hashing, the implementation partitions the dataset into more and more identifiable subsets until eventually landing on the nearest neighbor.
  • Hybrid, as the name suggests, is some form of a combination of the above implementations.

Because of the limitations of kNN such as dataset size and dimensionality, algorithms such as ANN become vital to solving classification problems with these kinds of constraints. Examples of these problems include feature extraction in computer vision, machine learning, and many more. Because of the prominence of ANN, and the range of applications for the technique, it is important to gauge how different implementations of ANN compare under different conditions. This process is called “Benchmarking”. Much like a traditional experiment we keep all variables constant besides the ANN algorithms, then compare outcomes to evaluate the performance of each implementation. Furthermore, we can take this experiment and repeat it for a variety of datasets to help understand how these algorithms perform depending on the type and size of the input datasets. The results can often be valuable in helping developers and researchers decide which implementations are ideal for their conditions, it also clues the creators of the algorithms into possible directions for improvement.

Open Source to the Rescue


Utilizing the power of online collaboration we are able to pool many great ideas into effective solutions

Beginning the benchmarking task can seem daunting at first given the scope and variability of the task. Luckily for us, we are able to utilize work already done in the field of benchmarking ANN algorithms. Aumüller, Bernhardsson, and Faithfull’s paper ANN-Benchmarks: A Benchmarking Tool for Approximate Nearest Neighbor Algorithms and corresponding GitHub repository provides an excellent starting point for the project.

Bernhardsson, who built the code with help from Aumüller and Faithfull, designed a python framework that downloads a selection of datasets with varying dimensionality (25 to nearly 28,000 dimensions) and size (few hundred megabytes to a few gigabytes). Then, using some of the most common ANN algorithms from libraries such as scikit-learn or the Non-Metric Space Library, they evaluated the relationship between queries-per-second and accuracy. Specifically, the accuracy was a measure of “recall”, which measures the ratio of the number of result points that are true nearest neighbors to the number of true nearest neighbors, or formulaically:


Intuitively recall is simply the correct predictions made by the algorithm, over the total number of correct predictions it could have made. So a recall of “1” means that the algorithm was correct in its predictions 100% of the time.

Using the project, which is available for replication and modification, I went about setting up the benchmark experiment. Given the range of different ANN implementations to test (24 to be exact), there are many packages that will need to be installed as well as a substantial amount of time required to build the docker environments. Assuming everything installs and builds as intended the environment should be ready for testing.



After three days of run time for the GloVe-25-angular dataset, we finally achieved presentable results. Three days of runtime was quite substantial for this primary dataset, however as we soon learned this process can be sped up considerably. The implementation of the benchmark experiment defaults to running benchmarks twice and averaging the results to better account for system interruptions or other anomalies that might impact the results. If this isn’t an issue, computation time could be halved by only performing the benchmark tests once each. In our case we wanted to match Bernhardsson’s results so we computed the benchmark with the default setting of two runs per algorithm which produced the following:

Our results (top) and Bernhardsson’s results (bottom):



My Results vs Bernhardsson’s Results

As you can see from the two side by side plots of algorithm accuracy vs algorithm query speed there are some differences between my results and Bernhardsson’s. In our case, there are 18 functions plotted as opposed to 15 in the other. This is likely because the project has since been updated to include more functions following Bernhardsson’s initial tests. Furthermore, the benchmarking was performed on a different machine to Bernhardsson’s which likely produced some additional variability.

What we do see which is quite impressive is that many of the same algorithms that performed well for Bernhardsson also performed well in our tests. This suggests that across multiple benchmarks there are some clearly well-performing ANN implementations. NTG-onng, hnsw(nmslib) and hnswlib all performed exceedingly well in both cases. Hnsw(nmslib) and hnswlib both belong to the Hierarchical Navigable Small World family, an example of a graph-based implementation for ANN. In fact, many of the algorithms tested, graph-based implementations seemed to perform the best. NTG-onng is also an example of a graph-based implementation for ANN search. This suggests that graph-based implementations of ANN algorithms for this type of dataset perform better than other competitors.

In contrast to the well-performing graph-based implementations, we can see BallTree(nmslib) and rpforest both of which in comparison are quite underwhelming. BallTree and rpforest are examples of tree-based ANN algorithms (a more rudimentary form of a graph-based algorithm). BallTree specifically is a hybrid tree-partition algorithm combining the two methods for the ANN process. It is likely a series of reasons that cause these two ANN algorithms to perform poorly when compared to HNSW or NTG-onng. However, the main reason seems to be that tree-based implementations execute slower under the conditions of this dataset.

Although graph-based implementations outperform other competitors it is worth noting that graph-based implementations suffer from a long preprocessing phase. This phase is required to construct the data structures necessary for the computation of the dataset. Hence using graph-based implementations might not be ideal under conditions where the preprocessing stage would have to be repeated.

One advantage our benchmark experiment had over Bernhardsson’s is our tests were run on a more powerful machine. Our machine (see appendix for full specifications) utilized the power of 2 Intel Xeon Gold 5115’s, an extra 32 GBs of DDR4 RAM totaling 64 GBs, and 960 GBs of solid-state disk storage which differs from Bernhardsson’s. This difference likely cut down on computation time considerably, allowing for faster benchmarking.

A higher resolution copy of my results can be found in the appendix.

Conclusion and Future Work


Further benchmarking for larger deep learning datasets would be a great next step.

Overall, my first experience with benchmarking ANN algorithms has been an insightful and appreciated learning opportunity. As we discussed above there are some clear advantages to using NTG-onng and hnsw(nmslib) on low dimensional smaller datasets such as the glove-25-angular dataset included with Erik Bernhardsson’s project. These findings, whilst coming at an immense computational cost, are none the less useful for data scientists aiming to tailor their use of ANN algorithms to the dataset they are utilizing.

Whilst the glove-25-angular dataset was a great place to start I would like to explore how these algorithms perform on even larger datasets such as the notorious deep1b (deep one billion) dataset which includes one billion 96 dimension points in its base set. Deep1b is an incredibly large file that would highlight some of the limitations as well as the advantages of various ANN implementations and how they trade-off between query speed and accuracy. Thanks to the hardware provided by GSI Technology this experiment will be the topic of our next blog.


  1. Computer specifications: 1U GPU Server 1 2 Intel CD8067303535601 Xeon® Gold 5115 2 3 Kingston KSM26RD8/16HAI 16GB 2666MHz DDR4 ECC Reg CL19 DIMM 2Rx8 Hynix A IDT 4 4 Intel SSDSC2KG960G801 S4610 960GB 2.5″ SSD.
  2. Full resolution view of my results:


  1. Aumüller, Martin, Erik Bernhardsson, and Alexander Faithfull. “ANN-benchmarks: A benchmarking tool for approximate nearest neighbor algorithms.” International Conference on Similarity Search and Applications. Springer, Cham, 2017.
  2. Liu, Ting, et al. “An investigation of practical approximate nearest neighbor algorithms.” Advances in neural information processing systems. 2005.
  3. Malkov, Yury, et al. “Approximate nearest neighbor algorithm based on navigable small-world graphs.” Information Systems 45 (2014): 61–68.
  4. Laarhoven, Thijs. “Graph-based time-space trade-offs for approximate near neighbors.” arXiv preprint arXiv:1712.03158 (2017).

LIDAR – Shaping the future of automotive

LIDAR plays a major role in automotive, as vehicles perform tasks with less and less human supervision and intervention. As a leader in VCSEL, ams is helping to shape this revolution.

LIDAR (Light Detection and Ranging) is an optical sensing technology that measures the distance to other objects. It is currently known for many diverse applications in industrial, surveying, and aerospace, but is a true enabler for autonomous driving. As the automotive manufacturers continue their push to design and release high-complexity autonomous systems, we likewise develop the technology that will enable this. That is why ams continues to bring our high-power VCSELs to the automotive market and to test the limits on peak power, shorter pulses, and additional scanning features which enable our customers to improve their LIDAR systems.

In 2019, ams together with ZF and Ibeo announced a hybrid solution called True Solid State where, like flash technology, no moving parts are needed to capture the full scene around the vehicle. By sequentially powering a portion of the laser, a scanning pattern can be generated, combining the advantages of flash and scan systems.

Making sense of the LIDAR landscape

At ams, we classify LIDAR systems on seven elements: ranging principle, wavelength, beam steering principle, emitter technology and layout, and receiver technology and layout. Here we discuss the first five.

The most dominant implementation to measure distance (ranging) is Direct Time of Flight (DTOF): a short (few nanoseconds) laser pulse is emitted, reflected by an object and returned to a receiver. The time difference between sending and receiving can be converted into a distance measurement. Moreover, with duty cycles of <1% this system takes thousands of distance measurements per second. The laser pulse is typically in the 850-940nm rage, components are readily available and most affordable. However, systems can also be using 1300 or 1550nm, the big advantage is eye safety regulations allow more energy to be used here, and in theory, this provides more range. The downside is that components are expensive.

To scan the complete surroundings (or field of view) of a vehicle, the system needs to be able to shoot pulses in all directions. This is the beam steering principle. Classical systems used rotating sensor heads and mirrors to scan the field of view section by section. As these systems are bulky, they are being replaced by static systems with internal moving mirrors. MEMS mirrors are also about to enter the market. Another approach is flash, where no moving parts are needed at all. The light source illuminates the complete field of view, and the sensor captures that same field in a single frame like a photo. As the full scene is illuminated, and to remain eye safe, this means the range must be limited.

On the emitter side, edge emitters continue to be frequently used, based on earlier developments. They have a high-power density, making them suitable in combination with MEMS mirrors. Where first iterations were single emitters, meanwhile 2-4-8-16 emitters are being integrated in a single bar. Fiber lasers are another interesting technology. They offer even higher power density, and typically are used in 1550nm wavelength and come typically as a single emitter source.

ams is a leading supplier in the VCSEL emitter technology. Our high power VCSELs can differentiate in scan and flash applications as they are very stable over temperature, are less sensitive to individual emitter failures, and are easy to integrate. However, the best characteristic of VCSELs are their ability to form emitter arrays. This makes VCSELs easy to scale. It also allows for addressability, or powering selective zones of the die. This enables True Solid State topology, which we consider to be the most all-rounded LIDAR solution.

LIDAR enables Autonomous Driving

The most commonly accepted way to classify vehicles on their level of autonomy is by the definitions of the Society of Automotive Engineers (SAE). At SAE Level 3 and above, the vehicle takes over responsibility from the driver and assistance turns into autonomy. This means the vehicle should be able to perform its task without human supervision and intervention. This requires a step function in required system performance. Where Level 1 and Level 2 vehicles assist the driver and typically rely on camera or radar, or a combination, there are shortcomings in these technologies for 3D object detection. LIDAR technology addresses this, and there is wide consensus in the industry that from Level 3 onwards, LIDAR is needed for 3D object detection.

When 3D LIDAR is combined or fused with camera and radar, a high-resolution map of the vehicle’s surroundings can be constructed and allow the vehicle to safely fulfil its mission. The automotive industry started with more straightforward driver-assist use cases used in Level 1 and Level 2. As sensors and data processing gets more advanced, further more difficult use cases can be covered, such as Highway Pilot or City Pilot.

Ultimately, when every conceivable use case can be fulfilled by the system we define this as a Level 5 vehicle – fully autonomous and the holy grail of autonomous driving. This is expected to still be quite a number of years out from today. Moreover, there will be huge pressure to bring down cost and rationalize content per vehicle – to make autonomous driving available to the mass market.

Interested to learn more?

Let us know if you would like to discuss how you could be using ams technology to support your potential LIDAR applications!
Contact ams sensor experts