Home

Awesome

ff-gpu

Finite Field Operations on GPGPU

Background

In recent times, I've been interested in Finite Field operations, so I decided to implement few fields in SYCL DPC++, targeting accelerators ( specifically GPGPUs ).

In this repository, currently I keep implementation of two finite field's arithmetic operations, accompanied with relevant benchmarks on both CPU, GPGPU.

I've also written following implementations, along with benchmark results on CPU, GPU.

Prerequisites

Benchmarks

$ lsb_release -d

Description:    Ubuntu 20.04.3 LTS
$ dpcpp --version

Intel(R) oneAPI DPC++/C++ Compiler 2022.0.0 (2022.0.0.20211123)
Target: x86_64-unknown-linux-gnu
Thread model: posix

or

$ clang++ --version

clang version 14.0.0 (https://github.com/intel/llvm dc9bd3fafdeacd28528eb4b1fef3ad9b76ef3b92)
Target: x86_64-unknown-linux-gnu
Thread model: posix
make # JIT kernel compilation on *default* device, for AOT read below
./run
DEVICE=cpu make   # still JIT, but in runtime use CPU
DEVICE=gpu make   # still JIT, but in runtime use GPU
DEVICE=host make  # still JIT, but in runtime use HOST
make clean
make format
lscpu | grep -i avx
DEVICE=cpu make aot_cpu
DEVICE=gpu make aot_gpu

You may have some other hardware, consider taking a look at AOT compilation guidelines & make necessary changes in Makefile.


Targeting Nvidia GPU with CUDA backend :

For targeting Nvidia GPU, you want to run DEVICE=gpu make cuda, so that benchmark suite is compiled for CUDA backend. I suggest you read this for setting up your machine with Nvidia GPU, if you've not yet.


I run benchmark suite on both Intel CPU/ GPU and Nvidia GPU, keeping results 👇

Tests

You can run basic test cases using

# set variable to runtime target device

DEVICE=cpu|gpu|host make test 

There's another set of randomised test cases, which asserts results ( obtained from my prime field implementation ) with another finite field implementation module, written in Python, named galois.

For running those, I suggest you first compile shared object using

# set variable to runtime target device

DEVICE=cpu|gpu|host make genlib

After that you can follow next steps here.