Home

Awesome

Github HuggingFace

FollowBench: A Multi-level Fine-grained Constraints Following Benchmark for Large Language Models (ACL 2024)

We introduce FollowBench, a Multi-level Fine-grained Constraints Following Benchmark for systemically and precisely evaluate the instruction-following capability of LLMs.

<p align="center"> <br> <img src="figures/overview.png" width="1200"/> <br> </p>

🔥 Updates

🔍 Table of Contents

<a name="leaderboard"></a>

🖥️ Leaderboard

Metrics

Level-categorized Results

English

<p align="center"> <br> <img src="figures/Level.png" width="800"/> <br> </p>

Chinese

<p align="center"> <br> <img src="figures/Level_zh.png" width="800"/> <br> </p>

Constraint-categorized Results

English

<p align="center"> <br> <img src="figures/Category.png" width="500"/> <br> </p>

Chinese

<p align="center"> <br> <img src="figures/Category_zh.png" width="500"/> <br> </p>

<a name="data-of-followbench"></a>

📄 Data of FollowBench

The data of FollowBench can be found in data/.

We also provide a Chinese version of FollowBench in data_zh/.

<a name="how-to-evaluate-on-followbench"></a>

⚙️ How to Evaluate on FollowBench

Install Dependencies

conda create -n followbench python=3.10
conda activate followbench
conda install pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-cuda=11.7 -c pytorch -c nvidia
pip install -r requirements.txt

Model Inference

cd FollowBench/
python code/model_inference.py --model_path <model_name_or_path>

LLM-based Evaluation

cd FollowBench/
python code/llm_eval.py --model_path <model_name_or_path> --api_key <your_own_gpt4_api_key>

Merge Evaluation and Save Results

Next, we conduct rule-based evaluation and merge the rule-based evaluation results and LLM-based evaluation results using the following script:

cd FollowBench/
python code/eval.py --model_paths <a_list_of_evaluated_models>

The final results will be saved in the folder named evaluation_result.

<a name="citation"></a>

📝 Citation

Please cite our paper if you use the data or code in this repo.

@inproceedings{jiang-etal-2024-followbench,
    title = "{F}ollow{B}ench: A Multi-level Fine-grained Constraints Following Benchmark for Large Language Models",
    author = "Jiang, Yuxin  and
      Wang, Yufei  and
      Zeng, Xingshan  and
      Zhong, Wanjun  and
      Li, Liangyou  and
      Mi, Fei  and
      Shang, Lifeng  and
      Jiang, Xin  and
      Liu, Qun  and
      Wang, Wei",
    editor = "Ku, Lun-Wei  and
      Martins, Andre  and
      Srikumar, Vivek",
    booktitle = "Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
    month = aug,
    year = "2024",
    address = "Bangkok, Thailand",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.acl-long.257",
    pages = "4667--4688",
}