Home

Awesome

openmlsys-cuda

Examples for beginners to write your own high-performance AI operators. We introduced optimizations tricks like using shared memory and pipeline rearrangement to maximize the throughput. We also provided an example for using CUTLASS to implement an FC + ReLU fused operator.

Dependencies

Installation Hints

Compilation

Once you have installed the dependencies, you can use the following instruction to compile the project:

git clone git@github.com:openmlsys/openmlsys-cuda.git
cd openmlsys-cuda
git submodule init && git submodule sync
mkdir build && cd build
cmake ..
make -j4

Examples