Home

Awesome

introduction

A simple high performance CUDA GEMM, Block Sparse GEMM and Non-uniform Quantized GEMM implementation.

C = alpha * A * B + beta * C

algorithm

located in src/cuda/

experiments

located in benchmark/

TODO

run

mkdir builds
make benchmark_[experiment name]
bash scripts/benchmark_[experiment name].sh

Note