Home

Awesome

EFANNA_graph: an Extremely Fast Approximate Nearest Neighbor graph construction Algorithm framework

EFANNA is a flexible and efficient library for approximate nearest neighbor search (ANN search) on large scale data. It implements the algorithms of our paper EFANNA : Extremely Fast Approximate Nearest Neighbor Search Algorithm Based on kNN Graph.
This project (efanna_graph) contains only the approximate nearest neighbor graph construction part in our EFANNA paper. The reasons are as follows:

How To Complie

Case 1: If the dataset is very large (more than 4M points), you may want to use faiss to build a graph for initilization. Then you need to downlad the faiss package and compile with it.

cd efanna_graph/extern_libraries/
git clone https://github.com/facebookresearch/faiss.git
cd faiss/

Please see the documentation of faiss for how to compile the faiss.

Case 2: If you don't need the faiss for initilization, you can compile efanna_graph without the faiss support.

cd efanna_graph/
rm include/efanna2e/index_pq.h src/index_pq.cpp
comment the last two lines in tests/CMakeLists.txt

Then go to the root directory of efanna_graph and make.

cd efanna_graph/
cmake .
make

How To Use

Meaning of the parameters:

**data_file** is the path of the origin data.
**save_graph** is the path of the kNN graph to be saved.
**K** is the 'K' of kNN graph.
**L** is the parameter controlling the graph quality, larger is more accurate but slower, no smaller than K.
**iter** is the parameter controlling the iteration times, iter usually < 30.
**S** is the parameter contollling the graph quality, larger is more accurate but slower.
**R** is the parameter controlling the graph quality, larger is more accurate but slower.

Meaning of the parameters:

**nTrees** is the number of trees used to build the graph (larger is more accurate but slower)
**mLevel** conquer-to-depeth (smaller is more accurate but slower) 
**K** is the 'K' of kNN graph.
**saveing_graph** is the path of the kNN graph to be saved.
**init_graph** is the graph built by test_kdtree_graph, same with **saveing\_graph** here.

Meaning of the parameters(from left to right):

**IndexKey** is the parameter in faiss to build the index.
**SearchKey** is the parameter in faiss to search with the index.

Please see the documentation of faiss for how to set these two parameters.

Output format

Suppose the database has N points, and numbered from 0 to N-1. You want to build an approximate kNN graph. The graph can be regarded as a N * k Matrix. The saved kNN graph binary file saves the matrix by row. The first 4 bytes of each row saves the int value of k, next 4 bytes saves the value of M and next 4 bytes saves the float value of the norm of the point. Then it follows k*4 bytes, saving the indices of the k nearest neighbors of respective point. The N rows are saved continuously without seperating characters.

Input of EFANNA

Because there is no unified format for input data, users may need to write input function to read your own data. You may imitate the input function in our sample code (sample/efanna_efanna_index_buildgraph.cc) to load the data into our matrix. To use SIMD instruction optimization, you should pay attention to the data alignment problem of SSE / AVX instruction.

Benchmark data set

Acknowledgment

The implemnetation of nndescent is taken from Kgraph. They proposed the nndescent algorithm. Many thanks to them for inspiration.