Awesome

From Complex to Simple: Enhancing Multi-Constraint Complex Instruction Following Ability of Large Language Models

Official implementation of the paper "From Complex to Simple: Enhancing Multi-Constraint Complex Instruction Following Ability of Large Language Models".

We systematically study how to enhance the ability of LLMs to follow complex instructions, addressing the following research questions:

What training data is effective in enhancing complex constraint-following abilities?
- Training with compositional data can generally enhance models’ ability to follow complex instructions.
  - Training with atomic data (mostly with 1 constraint) can generally decrease performance compared to the backbone model for instructions with more than 1 constraint.
  - Training with compositional data (instructions with multi-constraints) can better generalize to lower-level complex instructions (instructions with fewer constraints).
  - Training with compositional data can even generalize to the compositions of out-of-domain constraints.
How to obtain high-quality compositional data?
- The outputs from weaker LLMs then refined by advanced LLMs (Discrimination) significantly outperform the outputs generated by advanced LLMs directly (Generation).
How to effectively utilize the data obtained through the discrimination-based method?
- We introduce a reinforcement learning fine-tuning (RLFT) based method that leverages both positive and negative samples to improve complex instruction following.
- We conduct extensive experiments to prove the effectiveness of our methods in terms of overall performance, training efficiency, and generalization abilities under four settings.

🔥Updates

2024/6/18: We posted the second version of our paper
2024/4/24: We posted the first version of our paper.
2024/4/22: We released the data and code of FCS

⚙️How to Use the Code

Install Dependencies

conda create -n fcs python=3.10.9
conda activate fcs
conda install pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-cuda=11.7 -c pytorch -c nvidia
pip install -r requirements.txt

Obtain Complex Instructions

Complex Instruction Synthesis

To obtain complex instructions:

First, we collect seed instructions from three widely used instruction-tuning datasets.
Then, we rewrite the instructions to incorporate multiple constraints. Here, you can complete the whole procedure by running the script gen_inst.sh:

python ../get_data/gen_inst.py \
    --seed_path=../get_data/data/seed_data.jsonl  \
    --data_path=../get_data/data/data.jsonl\
    --api_key=YOUR_API_KEY_TO_ACESS_GPT4\

An example of complex instrutcion is shown as below: Here are 3 different constraints in the instructions.

Get Model Outputs

You need to do inference with your model to get the responses to the complex instructions. Here, we provide a script to do inference for LLaMA via the script do_inference.sh:

CUDA_VISIBLE_DEVICES=YOUR_CUDA_DEVICES python ../get_data/do_inference.py \
    --data_path=../get_data/data/data.jsonl \
    --res_path=../get_data/data/res_llama2.jsonl \
    --model_path=PATH_TO_YOUR_MODEL\
    --lora_path=PATH_TO_YOUR_LORA_WEIGHT IF YOU USE LORA \

Teacher Correction

We propose a discrimination-based approach for obtaining the output, shown to be more effective than directly generating output with advanced LLMs.

First, we utilize the test scripts from IFEval to identify the constraints the model failed to follow since the constraints are objective and automatically verifiable. Simply run the script check.sh:

python ../get_data/check.py \
    --input_data=../get_data/data/data.jsonl \
    --input_response_data=../get_data/data/res_llama2.jsonl \
    --output_dir=../get_data/data/ \
    --output_file_name=checked_res_llama2

Then, we adopt advanced LLMs (teacher model) GPT-3.5-turbo to correct the failed constraints one by one. You can correct the response to simultaneously get data for IFT and DPO with the script correct.sh:

python ../get_data/correct.py \
    --res_path=../get_data/data/res_llama2.jsonl  \
    --ift_data_path=../dpo_train/data/ift_train.jsonl \
    --dpo_data_path=../dpo_train/data/dpo_train.jsonl \
    --api_key=YOUR_API_KEY_TO_ACESS_GPT4\

Contrastive Method (Go for DPO Training)

The slight changes in the instruction (i.e. json to xml) can cause substantial output differences. Hence, negative samples failing to meet certain constraints, also offer valuable supervision signals. we leverage the positive and negative samples through reinforcement learning fine-tuning.

Here, we provide a revised implementation for an advanced DPO in dpo_train. You can set your model_path and data_path in dpo_train/dpo_train.py. Then, you can train the model with the script train_dpo.sh:

CUDA_VISIBLE_DEVICES=YOUR_CUDA_DEVICES accelerate launch \
    --config_file ../dpo_train/deepspeed_zero1.yaml dpo_train.py \
    --output_dir=PATH_TO_SAVE_MODEL \

Citation

@misc{he2024complex,
      title={From Complex to Simple: Enhancing Multi-Constraint Complex Instruction Following Ability of Large Language Models}, 
      author={Qianyu He and Jie Zeng and Qianxi He and Jiaqing Liang and Yanghua Xiao},
      year={2024},
      eprint={2404.15846},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}