Home

Awesome

Instruction Tuning with GPT-4

Baolin Peng*, Chunyuan Li*, Pengcheng He*, Michel Galley, Jianfeng Gao (*Equal Contribution)

[Project Page] [Paper]

<p align="center"> <img src="https://instruction-tuning-with-gpt-4.github.io/images/gpt4llama_logo.png" width="50%"> <br> Pronounced as "GPT-4-LLM" or "GPT-for-LLM", image is generated by <a href="https://gligen.github.io/">GLIGEN</a> </p>

Code License Data License

This is the repo for the GPT-4-LLM, which aims to share data generated by GPT-4 for building an instruction-following LLMs with supervised learning and reinforcement learning. The repo contains:

Usage and License Notices: The data is intended and licensed for research use only. The dataset is CC BY NC 4.0 (allowing only non-commercial use) and models trained using the dataset should not be used outside of research purposes.

:fire: News

<!--- * **[2023.04.07]** Data restored. * **[2023.04.07]** :warning: We turn off the data downloading temporarily. -->

Overview

Large Language Models (LLMs) have shown impressive generalization capabilities such as in-context-learning and chain-of-thoughts reasoning. To enable LLMs to follow natural language instructions and complete real-world tasks, researchers have been exploring methods of instruction-tuning of LLMs. To advance the state of the art of instruction-tuning for LLMs, we present the first attempt to use GPT-4 to generate instruction-following data for LLM finetuning.

Data Release

How Good is the Data

Human evaluation was performed on model generation results using Amazon Mechanical Turk following Helpfulness, Honestness and Harmlessness criteria by Anthropic AI. The results are summarized as follows:

LLaMA-GPT4 vs Alpaca (i.e., LLaMA-GPT3) LLaMA-GPT4 vs GPT-4

Fine-tuning with the data

We follow the same reciple to fine-tune LLaMA as Alpaca using standard Hugging Face training code.

To reproduce our results with LLaMA 7B, first setup Alpaca repo and run the following CMDs:

## cmd we used to train LLaMA on 16*V100
torchrun --nproc_per_node=16 
--master_port=12345 train.py 
--model_name_or_path PATH/TO/LLaMA
--data_path ./data/alpaca_gpt4_data.json 
--output_dir PATH/TO/SAVE
--num_train_epochs 3 
--per_device_train_batch_size 1 
--per_device_eval_batch_size 1 
--gradient_accumulation_steps 4 
--evaluation_strategy "no" 
--save_strategy "steps" 
--save_steps 200 
--save_total_limit 1 
--learning_rate 2e-5 
--weight_decay 0. 
--warmup_ratio 0.03 
--lr_scheduler_type "cosine" 
--logging_steps 1 
--deepspeed configs/ds_config.json

To evaluate the results, we highly recommend users refer to Vicuna as they have provided awesome serving scripts and evaluation piplelines.

Collect results and reproduce figure plots

The results can be plotted using the included IPython notebook plots/main_plots.ipynb. Start the IPython Notebook server:

$ cd plots
$ ipython notebook

Select the main_plots.ipynb notebook and execute the included code. Note that without modification, we have copyed our extracted results into the notebook, and script will output figures in the paper. Some related data for plots have been provided in data, the generated plots are saved in plots/output If you've run your own training and wish to plot results, you'll have to organize your results in the same format instead.

Shortcut: to skip all the work and just see the results, take a look at this notebook with cached plots.

Citation

@article{peng2023instruction,
  title={Instruction Tuning with GPT-4},
  author={Peng, Baolin and Li, Chunyuan and He, Pengcheng and Galley, Michel and Gao, Jianfeng},
  journal={arXiv preprint arXiv:2304.03277},
  year={2023}
}

Related Projects

Acknowledgement

This repo benefits from LLaMA, Alpaca, and Vicuna. Thanks for their wonderful works.