Awesome

Caten

This repository is still in the early stages of development. Additionally, it includes many experimental approaches. Please consider this as a place to experiment with my ideas. Do not use it in a product under any circumstances.

Caten = Compile+AbstracTENsor

Caten is an experimental deep learning compiler. Our goal is to create a solution that’s as simple as tinygrad yet as flexible as TVM—all while extending the possibilities of interactive programming into the realm of AI.

We're looking for collaborators! Please join our Discord and let me know if you'd like to contribute!

Showcases

Caten is still under development, but it aims to support a wide range of models in the future—from image processing to text generation, and vision language models! Some models are already up and running.

Examples

We have two doc files that explain how the Caten compilation pipeline works:

End-to-End Example Which how the end-to-end compilation pipeline works.
Getting Started An intro to Caten.

Running LLMs

$ JIT=1 PARALLEL=8 ./roswell/caten.ros llm-example --model "gpt2" --prompt "Hello" --max-length 100

Give the GPT2 demo a try! You can pass compilation settings through environment variables.

For example, setting JIT=1 enables JIT compilation, while JIT_DEBUG >= 2 allows you to view the schedule and the generated kernels. Setting PARALLEL=8 divides the ScheduleGraph and compiles it in parallel.

You may still find the token/ms rate slow, but we're not yet at the stage of implementing an AutoScheduler to accelerate kernel performance (as well as GPU support). Once our IR matures enough to handle a wide range of deep learning models, we plan to focus on speeding things up!

Lazy Evaluation

Caten is capable of generating the necessary kernels independently!

Instead of relying on OpenBLAS bindings or hand-optimized CUDA kernels, Caten avoids abstractions that would restrict us to specific libraries.

Let’s take Matmul+Activation Fusion as an example to illustrate this approach:

(in-package :caten-user)

(pprint-graph
  (tensor-graph (!relu (!matmul (make-tensor `(a b)) (make-tensor `(b c))))))

When you set JIT=1, the graph is compiled to an external language. You can view the generated code by specifying JIT_DEBUG >= 2.

Give it a try in your REPL!

(in-package :caten-user)
;; (setf (ctx:getenv :JIT) 1) to set globally
(ctx:with-contextvar (:JIT 1 :JIT_DEBUG 4)
  (caten (!relu (!matmul (make-tensor `(a b)) (make-tensor `(b c))))))

We’ve adopted a RISC-style architecture. Ultimately, everything in Caten boils down to just 25 composable primitive ops.

When you replace tensor-graph with tensor-lowered-graph, you’ll see exactly what we mean! And by using ->dot instead of pprint-graph, you can visualize that graph right in your browser!

Finally, our lazy evaluation doesn’t make debugging any harder. If you want to check an intermediate result, just insert proceed at any point—it won’t break the computation graph!

;; They are the equivalent
(proceed (!sin (cos (ax+b `(3 3) 1 0))))
(proceed (!sin (proceed (!cos (ax+b `(3 3) 1 0)))))

Training Models (Experimental)

(in-package :caten-user)

(defsequence MLP (in-features hidden-dim out-features &key (activation #'!relu))
	     (Linear in-features hidden-dim)
	     (asnode activation)
	     (Linear hidden-dim hidden-dim)
	     (asnode activation)
	     (Linear hidden-dim out-features))

(defun build-mlp-model ()
  (let* ((model (MLP 64 32 16))
         (outputs (call model (make-tensor `(b 64) :from :x)))
         (loss (!cross-entropy (!softmax outputs) (make-tensor `(b 16) :from :y)))
         (runner (caten loss)))
    (values runner (hook-optimizers runner (SGD :lr 1e-3)))))

(defun train ()
  (multiple-value-bind (runner optimizers) (build-mlp-model)
    (dotimes (i 10)
      (forward runner `(:x . ,(rand `(10 64))) `(:y . ,(rand `(10 16))) `(b . 10)) ;; replace with mnist dataloader
      (backward runner)
      (mapc #'step-optimizer optimizers)
      (mapc #'zero-grad optimizers))))

Though our focus is still on the inference, we will support training models. (Still Experimental, Unstable.) I am not sure our backward scheduler can be expanded into more large and complicated graphs. :(

Getting Started

Install Roswell and suitable IDE. (If unsure, Emacs or Lem is recommended)
Install ISL (Integer Set Library) for the fast kernel generation.
Install Qlot
Check out getting-started.lisp

$ git clone git@github.com:hikettei/Caten.git
$ cd Caten
$ qlot install
$ qlot exec ros run
> (ql:quickload :caten)
> (in-package :caten-user)
> (proceed (!randn `(3 3)))

Get Involved

Join our Discord Server.
Check out our roadmap.
Create a PR

Caten is a project that started only a few months ago. We are currently in the stage of building a solid foundational library. Here’s what we’re looking for:

Feature additions with tests (e.g., new activations, unimplemented matrix operations)
Bug reports and additional tests.
Refactoring of the core compiler components
Improving the documentation

etc...

Before contributing, please note that there is no linter here. Make an effort to adhere to Google Common Lisp Style Guide. Changes that do not follow this should be rejected by the review.

Roadmap

Supported Models

Generative AI
- GPT2
- Llama3
- TinyLLAMA
- StableDiffusion
- QwenVL2
Classification
- MobileNetV2
- MobileNetV3
- ResNet18/ResNet34/ResNet50
- VIT_B_16
Segmentation
- CenterNet
Detection
- YoLOv3
- YoLOv7

Supported Formats

Common Lisp Frontend (caten/apis)
ONNX (caten/onnx)
GGUF (caten/gguf)

Quantization

Support Dequantization from GGUF
Support QOPs

Training

Autodiff
Fast Autodiff
Support Training (But still limited)
Distributed Training

Accelerators

Running tests

You should install python, numpy, pytorch before running the test-suite by using make install_extra. If not specified, install the latest one.

$ make install_extra # extra dependencies for running tests
$ make test