Awesome
title: Lora Cerebras Gpt2.7b Alpaca Shortprompt emoji: 🐨 colorFrom: yellow colorTo: pink sdk: gradio sdk_version: 3.23.0 app_file: app.py pinned: false license: apache-2.0
🦙🐕🧠 Cerebras-GPT2.7B LoRA Alpaca ShortPrompt
Scripts to finetune Cerebras GPT2.7B on the Alpaca dataset, as well as inference demos.
- It is the fastest model in the west!
- The model with LoRA weights merged-in available at HuggingFace/lxe/Cerebras-GPT-2.7B-Alpaca-SP
- The LoRA weights also available at HuggingFace/lxe/lora-cerebras-gpt2.7b-alpaca-shortprompt
- ggml version of the model available at HuggingFace/lxe/ggml-cerebras-gpt2.7b-alpaca-shortprompt. You can run this without a GPU and it's much faster than the original model
📈 Warnings
The model tends to be pretty coherent, but it also hallucinates a lot of factually incorrect responses. Avoid using it for anything requiring factual correctness.
📚 Instructions
-
Be on a machine with an NVIDIA card with 12-24 GB of VRAM.
-
Get the environment ready
conda create -n cerberas-lora python=3.10
conda activate cerberas-lora
conda install -y cuda -c nvidia/label/cuda-11.7.0
conda install -y pytorch=1.13.1 pytorch-cuda=11.7 -c pytorch
- Clone the repo and install requirements
git clone https://github.com/lxe/cerebras-lora-alpaca.git && cd !!
pip install -r requirements.txt
- Run the inference demo
python app.py
To reproduce the finetuning results, do the following:
- Install jupyter and run it
pip install jupyter
jupyter notebook
-
Navigate to the
inference.ipynb
notebook and test out the inference demo. -
Navigate to the
finetune.ipynb
notebook and reproduce the finetuning results.
- It takes about 5 hours with the default settings
- Adjust the batch size and gradient accumulation steps to fit your GPU
📝 License
Apache 2.0