Home

Awesome

<h1 align="center">AlpaGasus: Training a Better Alpaca Model with Fewer Data</h1> The unofficial implementation of "AlpaGasus: Training a better Alpaca with Fewer data." Trained models are available at the Huggingface and we will keep updating the filtered data.

Project page | Paper | Huggingface

This repo contains:

Note: thanks to the community for providing useful feedbacks, which really stimulates us to a better open-source.

<p align="center"> <img src="./figures/alpagasus.jpeg" width="30%"> <br> Our Model "AlpaGasus"is pronounced as "/ˈælpəˈɡeɪsəs/", or "/ˈælpəˈɡəsəs/". The logo is generated by <a href="https://www.midjourney.com/app/">Midjourney</a> </p>

News

Setup

pip install -r requirement.txt

Rating

Rate each (instruction, input, output) tuple in the Alpaca's 52k training set.

# Use ChatGPT as the response quality evaluator
export YOUR_OPENAI_API_KEY
# Use Claude as the response quality evaluator
export YOUR_CLAUDE_API_KEY

After the rating, you will need to use rating/filter.py and rating/get_scores.py to process your reviews obtained from ChatGPT/Claude.

Data Release

Score distribution

We provide the score distribution of the chatgpt rating here:

<p align="center"> <img src="./figures/scores.jpg" width="50%"> <br> We use the same prompt as the Alpagasus paper and get the score distribution shown as above, then we select 9k data by applying the threshold 4.5 and get the "chatgpt_9k.json". </p> <br>

Training

# prepare the data 
sh training/train_7b.sh
sh training/train_13b.sh
pip install deepspeed
torchrun --nproc_per_node=4 --master_port=<your_random_port> ./training/train_alpaca.py \
    --model_name_or_path <your_path_to_hf_converted_llama_ckpt_and_tokenizer> \
    --data_path ./data/filtered/dolly_3k.json \
    --bf16 True \
    --output_dir <your_output_dir> \
    --num_train_epochs 3 \
    --per_device_train_batch_size 1 \
    --per_device_eval_batch_size 1 \
    --gradient_accumulation_steps 32 \
    --evaluation_strategy "no" \
    --save_strategy "steps" \
    --save_steps 2000 \
    --save_total_limit 1 \
    --learning_rate 2e-5 \
    --weight_decay 0. \
    --warmup_ratio 0.03 \
    --deepspeed "./config/ds_config_13b.json" \
    --tf32 True

Evaluation

<p align="center"> <img src="./figures/main-results.jpg" width="100%"> <br> We evaluate the models on four testsets: Koala, Vicuna, WizardLM, and Sinstrcut. Our Alpagasus can be significantly better than the baseline models. </p>

We also provide the code and scripts for evaluating the models with these four testsets:

export OPENAI_API_KEY
cd evaluation/
sh run_eval.sh

References

Citation

If you think it is a useful repo, please cite the paper:

@misc{chen2023alpagasus,
      title={AlpaGasus: Training A Better Alpaca with Fewer Data}, 
      author={Lichang Chen and Shiyang Li and Jun Yan and Hai Wang and Kalpa Gunaratna and Vikas Yadav and Zheng Tang and Vijay Srinivasan and Tianyi Zhou and Heng Huang and Hongxia Jin},
      year={2023},
      eprint={2307.08701},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}