Home

Awesome

HNSW.jl

Approximate Nearest Neighbor Searches using the "Hierarchical Navigable Small World" (HNSW) algorithm as described in https://arxiv.org/abs/1603.09320 .

Highlights

Creating an Index

An Index in this library is a struct of type HierarchicalNSW which can be constructed using:

hnsw = HierarchicalNSW(data; metric, M, efConstruction)

Once the HierarchicalNSW struct is initialized the search graph can be built by calling

add_to_graph!(hnsw [, indices])

which iteratively inserts all points from data into the graph. Optionally one may provide indices a subset of all the indices in data to partially to construct the graph.

Searching

Given an initialized HierarchicalNSW one can search for approximate nearest neighbors using

idxs, dists = knn_search(hnsw, query, k)

where query may either be a single point of type eltype(data) or a vector of such points.

A simple example:

using HNSW

dim = 10
num_elements = 10000
data = [rand(dim) for i=1:num_elements]

#Intialize HNSW struct
hnsw = HierarchicalNSW(data; efConstruction=100, M=16, ef=50)

#Add all data points into the graph
#Optionally pass a subset of the indices in data to partially construct the graph
add_to_graph!(hnsw)

# optionally with a progress notification:
# step = (num_elements) ÷ 100
# add_to_graph!(hnsw) do i
#   if iszero(i % step)
#     @info "Processed: $(i ÷ step)%"
#   end
# end

queries = [rand(dim) for i=1:1000]

k = 10
# Find k (approximate) nearest neighbors for each of the queries
idxs, dists = knn_search(hnsw, queries, k)

Multi-Threading

A multi-threaded version of this algorithm is available. To use it, checkout the branch multi-threaded and start the indexing with:

 add_to_graph!(hnsw; multithreading=true)

For multi-threaded searches add multithreading=true as a keyword argument to knn_search.