Home

Awesome

Introduction

The k-nearest neighbor algorithm (k-NN) is a widely used machine learning algorithm used for both classification and regression. k-NN algorithms are used in many research and industrial domains such as 3-dimensional object rendering, content-based image retrieval, statistics (estimation of entropies and divergences), biology (gene classification), etc. The processing time of the kNN search still remains the bottleneck in many application domains, especially in high dimensional spaces. In this work, we try to address this processing time issue by performing the kNN search using the GPU.

Authors

Description

Inputs values of the k-NN algorithm:

For each query point, the k-NN algorithm locates the k closest points (k nearest neighbors) among the reference points set. The algorithm returns (1) the indexes (positions) of the k nearest points in the reference points set and (2) the k associated Euclidean distances.

We provide 3 CUDA implementations for this algorithm:

  1. knn_cuda_global computes the k-NN using the GPU global memory for storing reference and query points, distances and indexes.

  2. knn_cuda_texture computes the k-NN using the GPU texture memory for storing the reference points and the GPU global memory for storing other arrays. Using a texture usually speeds-up the computations compared to the first approach. However, due to a size constraint of the texture structures in CUDA, this implementation might not an option for some high dimensional problems.

  3. knn_cublas computes the k-NN using a different technique involving the use of CUBLAS (CUDA implementation of BLAS). The computation of the distances are split into several sub-problems better suited to a GPU acceleration. As a consequence, on some specific problems, this implementation may lead to a much faster processing time compared to the first two approaches.

For more information about this project, please refer to the original papers listed in the bibliography section bellow.

Compilation and testing

We provide a test code with the purpose of verifying the correct compilation and execution of the code as well as evaluating the processing time. This code performs the following tasks:

One can change the number of points, the point dimension and the parameter k by editing the test.cpp file. However, with some parameters value, the provided code might generate errors. For instance, if the number of points is too big, the GPU memory might not be big enough to store the different arrays allocated by the program. Consequently, an allocation error should be displayed.

Before attempting to compile this code, one should verify the following points:

Compile the project using the Makefile:

Reference compilation / execution

On August 1st, 2018, the code was successfully compiled and executed:

PARAMETERS
- Number reference points : 16384
- Number query points     : 4096
- Dimension of points     : 128
- Number of neighbors     : 16

Ground truth computation in progress...

TESTS
- knn_c             : PASSED in 90.46776 seconds (averaged over   2 iterations)
- knn_cuda_global   : PASSED in  0.04166 seconds (averaged over 100 iterations)
- knn_cuda_texture  : PASSED in  0.06296 seconds (averaged over 100 iterations)
- knn_cublas        : PASSED in  0.03112 seconds (averaged over 100 iterations)

Specifications of the platform used:

Data storage and representation

The storage and representation of a set of points used in this project is similar to the one found with Deep Neural Network projects. Let us consider the following set of 3 points of dimension 2:

p1 = (1, 2)
p2 = (3, 4)
p3 = (5, 6)

We represent these points as a 2D array A of size 2 x 3:

A = | 1 3 5 |
    | 2 4 6 |

Internally, this array is actually stored as a linear array (row-major order):

A = (1, 3, 5, 2, 4, 6)

Bibliography

Important note

The provided code was written in the context of a Ph.D. thesis. The code was intended to provide a fast and generic solution to the challenging k-NN search problem. However, authors are well aware that for a very specific application (e.g. small number of points, small dimension, only one query point, etc.) there is often a better implementation either in terms of computation time or memory footprint. It is known that a GPGPU implementation should be adapted to the problem and to the specifications of the GPU used. This code will help one to identify if the GPGPU solution is viable for the given problem. Following the terms of the license below, one is allowed and should adapt the code.§

Legacy code

The original code written in 2008 is still available in the legacy folder.

Changes made in the 2018 update: