Home

Awesome

A Scala 2 port of Andrej Karpathy's llama2.c

This is a Scala port of Andrej Karpathy's llama2.c, a bare bones implementation to run inference of models with a Llama-like transformer-based LLM architecture.

The code expects tokenizer.bin and stories15M.bin in the current directory.

This started as a port of the original code in pure Scala. Later, more high-level abstractions were added and low-level C kernels with AVX2 intrinsics to speed up matrix multiplication.

asciicast

Features:

Performance

Current numbers run with version 08c65d04 on my AMD Ryzen 7 4800H laptop with GraalVM JDK 17.

Implementations:

Notes:

ModelQuantizationImplementationThreadstok / s
stories15M.binQ4native-avx21494
stories15M.binQ4native-avx26931
stories15M.binQ4Scala165
stories15M.binQ8native-avx21533
stories15M.binQ8native-avx26800
stories15M.binQ8Scala157
stories15M.binnonenative-avx21374
stories15M.binnonenative-avx26677
stories15M.binnoneScala166
stories15M.binnonescala-native vanilla114
stories15M.binnonescala-native (native mmaps)150
stories42M.binQ4native-avx21223
stories42M.binQ4native-avx26497
stories42M.binQ4Scala124
stories42M.binQ8native-avx21229
stories42M.binQ8native-avx26407
stories42M.binQ8Scala122
stories42M.binnonenative-avx21137
stories42M.binnonenative-avx26243
stories42M.binnoneScala124
stories42M.binnonellama2.c / run121
stories42M.binnonellama2.c / runfast169
stories42M.binnonellama2.c / runomp198
stories42M.binnonellama2.c / runomp6195
stories110M.binQ4native-avx2195
stories110M.binQ4native-avx26239
stories110M.binQ4Scala19.6
stories110M.binQ8native-avx2199
stories110M.binQ8native-avx26183
stories110M.binQ8Scala18.4
stories110M.binnonenative-avx2150
stories110M.binnonenative-avx2685
stories110M.binnoneScala18.9
stories110M.binnonellama2.c / runomp677
llama2_7b.binQ4native-avx212.0
llama2_7b.binQ4native-avx266.5
llama2_7b.binQ4Scala10.16
llama2_7b.binQ8native-avx211.9
llama2_7b.binQ8native-avx264.46
llama2_7b.binQ8Scala10.14
llama-2-7b.ggmlv3.q4_0.binas providednative-avx211.66
llama-2-7b.ggmlv3.q4_0.binas providednative-avx266.71
llama-2-7b.ggmlv3.q4_0.binas providedScala10.13
llama-2-7b.ggmlv3.q4_0.binas providedllama.cpp12.0
llama-2-7b.ggmlv3.q4_0.binas providedllama.cpp68.1

License

MIT