Awesome

Rust Candle Demo

An interactive command line tool to demonstrate how to use HuggingFace's rust Candle ML framework to execute LLM.

This demo uses the quantized version of LLM openchat: https://huggingface.co/TheBloke/openchat_3.5-GGUF by default.

Prepare

Make sure you have installed the huggingface cli, if not, do it:

pip install -U "huggingface_hub[cli]"

And then you should download this model file associated with the original openchat tokenizer.json file:

mkdir hf_hub
HF_HUB_ENABLE_HF_TRANSFER=1 HF_ENDPOINT=https://hf-mirror.com huggingface-cli download TheBloke/openchat_3.5-GGUF openchat_3.5.Q8_0.gguf  --local-dir hf_hub
HF_HUB_ENABLE_HF_TRANSFER=1 HF_ENDPOINT=https://hf-mirror.com huggingface-cli download openchat/openchat_3.5 tokenizer.json --local-dir hf_hub

Run

There are two examples here:

simple: all parameters are hardcoded into code to make everything simplest, but you need to modify the model and tokenizer.json file by yourself, and run by:

cargo run --release --bin simple

cli: you can use this cli program to pass parameters from command line.

cargo run --release --bin cli -- --model=xxxxxxx --tokenizer=xxxx

You can use --help to show what parameters could be configured.

$ cargo run --release --bin cli -- --help
    Finished release [optimized] target(s) in 0.04s
     Running `target/release/cli --help`
avx: false, neon: false, simd128: false, f16c: false
Usage: cli [OPTIONS]

Options:
      --tokenizer <TOKENIZER>            [default: ../hf_hub/openchat_3.5_tokenizer.json]
      --model <MODEL>                    [default: ../hf_hub/openchat_3.5.Q8_0.gguf]
  -n, --sample-len <SAMPLE_LEN>          [default: 1000]
      --temperature <TEMPERATURE>        [default: 0.8]
      --seed <SEED>                      [default: 299792458]
      --repeat-penalty <REPEAT_PENALTY>  [default: 1.1]
      --repeat-last-n <REPEAT_LAST_N>    [default: 64]
      --gqa <GQA>                        [default: 8]
  -h, --help                             Print help
  -V, --version                          Print version

License

None.

Feedback

Feel free to submit issues to this repository.