Home

Awesome

<!-- # Language Imbalance Driven Rewarding -->

Language Imbalance Driven Rewarding for Multilingual Self-improving

<div align="center"> <br> <a>Wen Yang</a><sup><span>1,2*</span></sup>, <a href="https://scholar.google.com/citations?user=Ci4l4yQAAAAJ&hl=zh-CN">Junhong Wu</a><sup><span>1,2*</span></sup>, <a href="https://scholar.google.com/citations?user=FgrrqlAAAAAJ&hl=zh-CN">Chen Wang</a><sup><span>1,2</span></sup>, <a href="https://scholar.google.com/citations?user=l8lvKOQAAAAJ&hl=zh-CN">Chengqing Zong</a><sup><span>1,2</span></sup>, <a href="https://scholar.google.com/citations?user=93zngeYAAAAJ&hl=zh-CN">Jiajun Zhang</a><sup><span>1,2,3,4🌟</span></sup>, <br>

* Equal contribution 🌟 Corresponding author

<sup>1</sup> School of Artificial Intelligence, University of Chinese Academy of Sciences<br> <sup>2</sup> Institute of Automation, Chinese Academy of Sciences<br> <sup>3</sup> Wuhan AI Research <sup>4</sup> Shanghai Artificial Intelligence Laboratory, Shanghai, China<br>

Multilingual-Self-Improving <a href='https://arxiv.org/pdf/2410.08964'><img src='https://img.shields.io/badge/Paper-Arxiv-red'></a>

</div> <p align="center"> <img src="assets/outline.png" width="100%" height="100%"> </p>

<font size=5><div align='center' > [πŸ“– arXiv Paper] </div></font>

Overview

We introduce Language Imbalance Driven Rewarding, a novel approach that leverages the inherent capability imbalance across different languages in large language models (LLMs) as a reward signal for iterative self-improvement. By applying iterative DPO training, our approach not only enhances the performance of non-dominant languages but also improves outcomes in dominant languages.

Our goal with this approach is to contribute a new perspective to the multilingual LLM community by challenging the assumption that language imbalance is solely a challenge to be mitigated. We hope this approach will inspire further exploration into multilingual self-improvement in LLMs, broadening the horizon for more balanced and capable language models.

πŸ”₯ Update

πŸ‘€ Contents

πŸ“· Setup

Please follow the instructions below to install the required packages.

  1. Clone this repository
https://github.com/ZNLP/Language-Imbalance-Driven-Rewarding.git
  1. Install Package
conda create -n mdpo python=3.10 -y
conda activate mdpo
cd Language-Imbalance-Driven-Rewarding
pip install -r requirements.txt

πŸ’‘ Preparation

bash ./scripts/batch_inference.sh
bash ./scripts/batch_translate.sh

πŸ“ˆ Train

Our training is mostly performed on LLaMA-Factory code base. Please refer to that repo for more details.

πŸ“ˆ Evaluation

bash scripts/batch_inference_for_eval.sh

πŸ‘€ Experiments

We provide some results in this section. More detailed results can be found in our paper.

General Instruction Following

<div align=center> <img width="90%" src="assets/head_to_head.png"/> </div> <div align=center> <img width="90%" src="assets/x-alpacaeval.png"/> </div> <div align='center'> <details> <summary>Click to expand more examples</summary> <p align="center"> <img src="assets/mt_bench.png" width="60%" height="60%"> <p align="center">The Multilingual MT-Bench Benchmark</p> <img src="assets/multilingual_NLP_tasks.png" width="60%" height="60%"> <p align="center">The Multilingual NLP Benchmarks</p> </p> </details> </div>

Arithmetic Reasoning

<div align=center> <img width="90%" src="assets/mgsm.png"/> </div>

Schedule

Citation

If you find this repo useful for your research, please consider citing the paper

@article{yang2024language,
  title={Language Imbalance Driven Rewarding for Multilingual Self-improving},
  author={Yang, Wen and Wu, Junhong and Wang, Chen and Zong, Chengqing and Zhang, Jiajun},
  journal={arXiv preprint arXiv:2410.08964},
  year={2024}
}

Acknowledgement

We would like to thank the following repos for their great work:

License

This project is released under the Apache 2.0 license. Parts of this project contain code and models from other sources, which are subject to their respective licenses.