Home

Awesome

<p align="center" width="100%"> <a href="https://crfm.stanford.edu/alpaca/" target="_blank"><img src="assets/logo.png" alt="Stanford-Alpaca" style="width: 50%; min-width: 300px; display: block; margin: auto;"></a> </p>

Stanford Alpaca Trainer - Updated for use training Replit's Code Model

Code License Python 3.9+ Code style: black

This was the repo for the Stanford Alpaca project, which is edited to become a trainer for Alpaca-format datasets over Replit's 3B Code Model:

Overview

A trainer for Replit's 3B parameter code model.

Dataset Format

Alpaca format datasets should be in the following format, in json:

Here is an example of a dataset:

[
  {
    "instruction": "Give three tips for staying healthy.",
    "input": "",
    "output": "1. Eat a balanced and nutritious diet: Make sure your meals are inclusive of a variety of fruits and vegetables, lean protein, whole grains, and healthy fats. This helps to provide your body with the essential nutrients to function at its best and can help prevent chronic diseases.\n\n2. Engage in regular physical activity: Exercise is crucial for maintaining strong bones, muscles, and cardiovascular health. Aim for at least 150 minutes of moderate aerobic exercise or 75 minutes of vigorous exercise each week.\n\n3. Get enough sleep: Getting enough quality sleep is crucial for physical and mental well-being. It helps to regulate mood, improve cognitive function, and supports healthy growth and immune function. Aim for 7-9 hours of sleep each night."
  },
  {
    "instruction": "What are the three primary colors?",
    "input": "",
    "output": "The three primary colors are red, blue, and yellow. These colors are called primary because they cannot be created by mixing other colors and all other colors can be made by combining them in various proportions. In the additive color system, used for light, the primary colors are red, green, and blue (RGB)."
  },  
]

We used the following prompts for fine-tuning the Replit model:

### Instruction:
{instruction}

### Input:
{input}

### Response:
### Instruction:
{instruction}

### Response:

Fine-tuning

To fine-tune for Replit's model, first install the requirements

pip install -r requirements.txt

The train.py script defaults to 2000 sequence length for training. It runs in small batch size at this sequence length on an a100 80gb. You will save a significant amount of vram, and thus, can train faster, with a smaller sequence length. Training on 2x a100 80gb with what is possible with 2000 token sequence length takes about 2.5 hours, with 512 token length, only 45~ minutes.

Below is a command that fine-tunes Replit-3B with an alpaca-formated dataset on a machine with 2 A100 80G GPUs with 2000 token sequence length.

Replace <your_random_port> with a port of your own, <path_to_replit_model> with the path to your converted checkpoint and tokenizer or leave default for Replit's base code model, and <your_output_dir> with where you want to store your outputs.

torchrun --nproc_per_node=2 --master_port=<your_random_port> train.py \
    --model_name_or_path <path_to_replit_model> \
    --data_path ./<your_dataset>.json \
    --bf16 True \
    --output_dir <your_output_dir> \
    --num_train_epochs 3 \
    --per_device_train_batch_size 1 \
    --gradient_accumulation_steps 4 \
    --evaluation_strategy "no" \
    --save_strategy "steps" \
    --save_steps 50 \
    --save_total_limit 2 \
    --learning_rate 2e-5 \
    --weight_decay 0. \
    --warmup_ratio 0.03 \
    --lr_scheduler_type "cosine" \
    --logging_steps 1 \

Note the given training script is meant to be simple and easy to use, and is not particularly optimized. To run on more gpus, you may prefer to turn down gradient_accumulation_steps to keep a global batch size of 128. Global batch size has not been tested for optimality.

Addressing OOM

Naively, fine-tuning a 7B model requires about 7 x 4 x 4 = 112 GB of VRAM. Commands given above enable parameter sharding, so no redundant model copy is stored on any GPU. If you'd like to further reduce the memory footprint, here are some options:

Original Authors of the Alpaca paper

All grad students below contributed equally and the order is determined by random draw.

All advised by Tatsunori B. Hashimoto. Yann is also advised by Percy Liang and Xuechen is also advised by Carlos Guestrin.

Citation

Please cite the repo if you use the data or code in this repo.

@misc{alpaca,
  author = {Rohan Taori and Ishaan Gulrajani and Tianyi Zhang and Yann Dubois and Xuechen Li and Carlos Guestrin and Percy Liang and Tatsunori B. Hashimoto },
  title = {Stanford Alpaca: An Instruction-following LLaMA model},
  year = {2023},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/tatsu-lab/stanford_alpaca}},
}

Naturally, you should also cite the original LLaMA paper [1] and the Self-Instruct paper [2].

Acknowledgements

We thank Yizhong Wang for his help in explaining the data generation pipeline in Self-Instruct and providing the code for the parse analysis plot. We thank Yifan Mai for helpful support, and members of the Stanford NLP Group as well as the Center for Research on Foundation Models (CRFM) for their helpful feedback.