Awesome

title: Lora Cerebras Gpt2.7b Alpaca Shortprompt emoji: 🐨 colorFrom: yellow colorTo: pink sdk: gradio sdk_version: 3.23.0 app_file: app.py pinned: false license: apache-2.0

🦙🐕🧠 Cerebras-GPT2.7B LoRA Alpaca ShortPrompt

Scripts to finetune Cerebras GPT2.7B on the Alpaca dataset, as well as inference demos.

It is the fastest model in the west!
The model with LoRA weights merged-in available at HuggingFace/lxe/Cerebras-GPT-2.7B-Alpaca-SP
The LoRA weights also available at HuggingFace/lxe/lora-cerebras-gpt2.7b-alpaca-shortprompt
ggml version of the model available at HuggingFace/lxe/ggml-cerebras-gpt2.7b-alpaca-shortprompt. You can run this without a GPU and it's much faster than the original model

📈 Warnings

The model tends to be pretty coherent, but it also hallucinates a lot of factually incorrect responses. Avoid using it for anything requiring factual correctness.

📚 Instructions

Be on a machine with an NVIDIA card with 12-24 GB of VRAM.
Get the environment ready

conda create -n cerberas-lora python=3.10
conda activate cerberas-lora
conda install -y cuda -c nvidia/label/cuda-11.7.0
conda install -y pytorch=1.13.1 pytorch-cuda=11.7 -c pytorch

Clone the repo and install requirements

git clone https://github.com/lxe/cerebras-lora-alpaca.git && cd !!
pip install -r requirements.txt

Run the inference demo

python app.py

To reproduce the finetuning results, do the following:

Install jupyter and run it

pip install jupyter
jupyter notebook

Navigate to the inference.ipynb notebook and test out the inference demo.
Navigate to the finetune.ipynb notebook and reproduce the finetuning results.

It takes about 5 hours with the default settings
Adjust the batch size and gradient accumulation steps to fit your GPU

📝 License

Apache 2.0