Home

Awesome

NearestNeighbors.jl

Build Status codecov.io DOI

NearestNeighbors.jl is a Julia package for performing nearest neighbor searches.

Creating a Tree

There are currently three types of trees available:

These trees can be created using the following syntax:

KDTree(data, metric; leafsize, reorder)
BallTree(data, metric; leafsize, reorder)
BruteTree(data, metric; leafsize, reorder) # leafsize and reorder are unused for BruteTree

All trees in NearestNeighbors.jl are static, meaning points cannot be added or removed after creation. Note that this package is not suitable for very high dimensional points due to high compilation time and inefficient queries on the trees.

Example of creating trees:

using NearestNeighbors
data = rand(3, 10^4)

# Create trees
kdtree = KDTree(data; leafsize = 25)
balltree = BallTree(data, Minkowski(3.5); reorder = false)
brutetree = BruteTree(data)

k-Nearest Neighbor (kNN) Searches

A kNN search finds the k nearest neighbors to a given point or points. This is done with the methods:

knn(tree, point[s], k, skip = always_false) -> idxs, dists
knn!(idxs, dists, tree, point, k, skip = always_false)

For the single closest neighbor, you can use nn:

nn(tree, points, skip = always_false) -> idxs, dists

Examples:

using NearestNeighbors
data = rand(3, 10^4)
k = 3
point = rand(3)

kdtree = KDTree(data)
idxs, dists = knn(kdtree, point, k, true)

idxs
# 3-element Array{Int64,1}:
#  4683
#  6119
#  3278

dists
# 3-element Array{Float64,1}:
#  0.039032201026256215
#  0.04134193711411951
#  0.042974090446474184

# Multiple points
points = rand(3, 4)
idxs, dists = knn(kdtree, points, k, true)

idxs
# 4-element Array{Array{Int64,1},1}:
#  [3330, 4072, 2696]
#  [1825, 7799, 8358]
#  [3497, 2169, 3737]
#  [1845, 9796, 2908]

# dists
# 4-element Array{Array{Float64,1},1}:
#  [0.0298932, 0.0327349, 0.0365979]
#  [0.0348751, 0.0498355, 0.0506802]
#  [0.0318547, 0.037291, 0.0421208]
#  [0.03321, 0.0360935, 0.0411951]

# Static vectors
using StaticArrays
v = @SVector[0.5, 0.3, 0.2];

idxs, dists = knn(kdtree, v, k, true)

idxs
# 3-element Array{Int64,1}:
#   842
#  3075
#  3046

dists
# 3-element Array{Float64,1}:
#  0.04178677766255837
#  0.04556078331418939
#  0.049967238112417205

# Preallocating input results
idxs, dists = zeros(Int32, k), zeros(Float32, k)
knn!(idxs, dists, kdtree, v, k)

Range Searches

A range search finds all neighbors within the range r of given point(s). This is done with the methods:

inrange(tree, points, r) -> idxs
inrange!(idxs, tree, point, r)

Distances are not returned.

Example:

using NearestNeighbors
data = rand(3, 10^4)
r = 0.05
point = rand(3)

balltree = BallTree(data)
idxs = inrange(balltree, point, r)

# 4-element Array{Int64,1}:
#  317
#  983
# 4577
# 8675

# Updates `idxs`
idxs = Int32[]
inrange!(idxs, balltree, point, r)

# counts points without allocating index arrays
neighborscount = inrangecount(balltree, point, r)

Using On-Disk Data Sets

By default, trees store a copy of the data provided during construction. For data sets larger than available memory, DataFreeTree can be used to strip a tree of its data field and re-link it later.

Example with a large on-disk data set:

using Mmap
ndim = 2
ndata = 10_000_000_000
data = Mmap.mmap(datafilename, Matrix{Float32}, (ndim, ndata))
data[:] = rand(Float32, ndim, ndata) # create example data
dftree = DataFreeTree(KDTree, data)

dftree stores the indexing data structures. To perform look-ups, re-link the tree to the data:

tree = injectdata(dftree, data) # yields a KDTree
knn(tree, data[:,1], 3) # perform operations as usual