Awesome

OpenSource GPU

Build an opensource GPU, targeting ASIC tape-out, for machine learning ("ML"). Hopefully, can get it to work with the PyTorch deep learning framework.

Vision

Create an opensource GPU for machine learning.

I don't actually intend to tape this out myself, but I intend to do what I can to verify somehow that tape-out would work ok, timings ok, etc.

Intend to implement a HIP API, that is compatible with pytorch machine learning framework. Open to provision of other APIs, such as SYCL or NVIDIA® CUDA™.

Internal GPU Core ISA loosely compliant with RISC-V ISA. Where RISC-V conflicts with designing for a GPU setting, we break with RISC-V.

Intend to keep the cores very focused on ML. For example, brain floating point ("BF16") throughout, to keep core die area low. This should keep the per-core cost low. Similarly, Intend to implement only few float operations critical to ML, such as exp, log, tanh, sqrt.

Architecture

Big Picture:

Big Picture

GPU Die Architecture:

GPU Die Architecture

Single Core:

Single Core

Single-source compilation and runtime

End-to-end Architecture

Simulation

Single-source C++

Single-source C++:

examples/cpp_single_source/sum_ints.cpp

Single-source C++

Compile the GPU and runtime:

CMakeLists.txt: src/gpu_runtime/CMakeLists.txt
GPU runtime: src/gpu_runtime/gpu_runtime.cpp
GPU controller: src/gpu_controller.sv
Single GPU RISC-V core: src/core.sv

Compile GPU and runtime

Compile the single-source C++, and run:

examples/cpp_single_source/run.sh sum_ints

Run single-source example

Planning

What direction are we thinking of going in? What works already? See:

docs/planning.md

Tech details

Our assembly language implementation and progress. Design of GPU memory, registers, and so on. See:

docs/tech_details.md

Verification

If we want to tape-out, we need solid verification. Read more at:

docs/verification.md

Metrics

we want the GPU to run quickly, and to use minimal die area. Read how we measure timings and area at:

docs/metrics.md