Home

Awesome

pecca-rs

Pecca is starting as a Rust port of the excellent @karpathy llama2.c, itself a minimalistic adaptation of llama.cpp.

Compared to other Rust ports, Pecca leverages ndarray, which has several advantages:

Going forward, Pecca will leverage Rust and its ecosystem whenever it makes sense, rather than attempting to avoid dependencies above all (like llama.cpp).

Usage

git clone https://github.com/rahoua/pecca-rs.git
cd pecca-rs
wget -P ./models/stories/  https://huggingface.co/karpathy/tinyllamas/resolve/main/stories15M.bin
cargo run --release generate ./models/stories/stories15M.bin

Pecca can be run similarly with larger tiny stories models (like the 110M one) or the llama2 models (only 7B recommended so far). For a full list of command line options run:

pecca-rs --help

To get the llama2 models, follow the instructions for llama2.c. Pecca supports the same model format. As Pecca does not use memmap, loading and quantizing the model on the fly can take some time. To speed things up, the models can also be saved quantized using the -f --write-model <path> command line switch.

For codellama, the instructions are similar except for the tokenizer which is slightly different. To make the process easier, the updated tokenizer is provided. To override the default tokenizer, run pecca using the -k command line option:

./target/release/pecca-rs generate /path/to/codellama-instr-7b.bin -k "./models/tokenizer-code.bin"

Performance

At the moment there's no formal benchmark, we just provide rough estimates to give a ballpark of overall performance.

Llama2 7B model on a Macbook Pro M2 Max:

Future Directions

A list of possible future developments for the project: