Home

Awesome

llama2.go

<p align="center"> <img src="web/cute-llama-to-go.png" width="300" height="300" alt="Cute Llama"> </p>

This is a Go port of llama2.c.

Setup

  1. Download a model:

  2. Download tokenizer.bin

  3. go install github.com/saracen/llama2.go/cmd/llama2go@latest

  4. Do things:

    ./llama2go --help
    llama2go: <checkpoint>
      -cpuprofile string
             write cpu profile to file
      -prompt string
             prompt
      -steps int
             max number of steps to run for, 0: use seq_len (default 256)
      -temperature float
             temperature for sampling (default 0.9)
    
    ./llama2go -prompt "Cute llamas are" -steps 38 --temperature 0 stories110M.bin
    <s>
    Cute llamas are two friends who love to play together. They have a special game that they play every day. They pretend to be superheroes and save the world.
    achieved tok/s: 43.268528
    

Performance

Tokens per second:

systemmodelllama2.cllama2.go (no cgo)llama2.go (cgo)
M1 Max, 10-Core, 32 GBstories15M676.392573246.885611473.840849
M1 Max, 10-Core, 32 GBstories42M267.29559798.165245151.396638
M1 Max, 10-Core, 32 GBstories110M100.67114142.59234569.804907