Awesome

dj_fft: Header-only FFT library

Details

This repository provides a header-only library to compute fourier transforms in 1D, 2D, and 3D. Its goal is to provide a fast and easy-to-use fast fourier transform algorithm.

Cloning

Clone the repository and all its submodules using the following command:

git clone --recursive git@github.com:jdupuy/dj_fft.git

If you accidentally omitted the --recursive flag when cloning the repository you can retrieve the submodules like so:

git submodule update --init --recursive

Usage

The 1D, 2D, and 3D FFT routines return an std::vector<std::complex<T>>, given another std::vector<std::complex<T>> as input, which specifies the data that must be transformed, as well as an enum class dj::fft_dir, which specifies in which direction the FFT must be computed (specify dj::fft_dir::DIR_FWD for the forward direction and dj::fft_dir::DIR_BWD for the backward direction).

Note that the input vector is expected to be of size N for 1D FFT, NxN for a 2D FFT, and NxNxN for a 3D FFT, where N must be a power of two. Note that the 2D and 3D vectors are expected to be arranged in a flat row-major fashion, i.e., the 2D and 3D elements (i, j) and (i, j, k) are respectively located at index i + N * j and i + N * (j + N * k) in memory.

Below is a C++ pseudocode for computing a 2D FFT in forward direction:

#define DJ_FFT_IMPLEMENTATION // define this in exactly *one* .cpp file
#include "dj_fft.h"

some_function()
{
  int N = size_of_your_input; // input size
  auto myData = std::vector<std::complex<T>>(N * N); // input data

  // prepare data
  for (int j = 0; j < N; ++j) {
    for (int i = 0; i < N; ++i) {
      myData[i + N * j] = some_value; // set element (i, j)
    }
  }

  // compute forward 2D FFT
  auto fftData = dj::fft2d(myData, dj::fft_dir::DIR_FWD);

  // print the data
  for (int j = 0; j < N; ++j) {
    for (int i = 0; i < N; ++i) {
      printf("{%f, %f} ", fftData[i + N * j].real(), fftData[i + N * j].imag());
    }
    printf("\n");
  }
}

To see examples that compile, see the examples/ directory.

GPU Acceleration

Additionally, the library provides GPU accelerated 1D, 2D, and 3D FFTs for std::vector<std::complex<float>> inputs. GPU acceleration is especially relevant for large 2D and 3D datasets. For instance:

for an input of size 4096x4096, a regular 2D FFT completes in roughly 18 seconds on an intel i7-8086k, and 0.9 seconds on an NVidia RTX 2080
for an input of size 512x512x512, a regular 3D FFT completes in roughly 131 seconds on an intel i7-8086k, and 8.2 seconds on an NVidia RTX 2080

The following table provides a more comprehensive set of measurements for 2D FFTs:

2D FFT Resolution	256²	512²	1024²	2048²	4096²	8192²
CPU (i7-8086k)	0.05s	0.22s	0.99s	4.32s	18.85s	81.96s
GPU (RTX 2080)	0.01s	0.02s	0.07s	0.24s	0.94s	3.68s
GPU speed-up	x5	x11	x14	x18	x20	x22

The following table provides a more comprehensive set of measurements for 3D FFTs:

3D FFT Resolution	64³	128³	256³	512³
CPU (i7-8086k)	0.19s	1.72s	15.70s	141.18s
GPU (RTX 2080)	0.04s	0.15s	1.03s	8.10s
GPU speed-up	x5	x11	x15	x17

Below is a C++ pseudocode for computing a 1D FFT in backward direction on the GPU:

#define DJ_FFT_IMPLEMENTATION // define this in exactly *one* .cpp file
#include "dj_fft.h"

some_function()
{
  int N = size_of_your_input; // input size
  auto myData = std::vector<std::complex<float>>(N); // input data

  // prepare data
  for (int i = 0; i < N; ++i) {
    myData[i] = some_float_value; // set element (i)
  }

  // compute backward 1D FFT
  auto fftData = dj::fft1d_gpu(myData, dj::fft_dir::FFT_BWD);

  // print the data
  for (int i = 0; i < N; ++i) {
    printf("{%f, %f}\n", fftData[i].real(), fftData[i].imag());
  }
}

Note that the return values of a GPU FFT may differ slightly from that of a regular FFT, due to the way floating point arithmetic is implemented.

For a complete example that compiles, see the examples/ directory.

GPU Acceleration (Advanced)

By default, the GPU accelerated routines run on the primary GPU. Users who want to run the FFT on a secondary GPU will have to create an OpenGL context themselves and use the fftNd_gpu_glready functions. You can create a custom OpenGL context with a cross-platform windowing library such as GLFW (https://www.glfw.org/), and an OpenGL function loader such as glad (https://glad.dav1d.de/). I'll probably add a sample at some point.

License

This library is in the public domain. You can do anything you want with them. You have no legal obligation to do anything else, although I appreciate attribution.

It is also licensed under the MIT open source license, if you have lawyers who are unhappy with public domain. The dj_fft.h source file includes an explicit dual-license for you to choose from.