Home

Awesome

rStar: Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers

This repository contains necessary scripts to run rStar's generator and discriminator.

Link to paper: https://huggingface.co/papers/2408.06195, https://arxiv.org/abs/2408.06195

Intro

We propose rStar, a Self-play muTuAl Reasoning approach that significantly improves reasoning capabilities of small language models (SLMs) without fine-tuning or superior models. rStar decouples reasoning into a self-play mutual generation-discrimination process.

<p align="center"> <img src="assets/teaser.png"> </p>

First, a target SLM augments the Monte Carlo Tree Search (MCTS) with a rich set of human-like reasoning actions to construct higher quality reasoning trajectories. Next, another SLM, with capabilities similar to the target SLM, acts as a discriminator to verify each trajectory generated by the target SLM. The mutually agreed reasoning trajectories are considered mutual consistent, thus are more likely to be correct.

rStar decomposes reasoning into solution generation and mutual verification. As for solution generation, we introduce a richer set of human-like reasoning actions that allows for thorough space exploration across diverse reasoning tasks. As for mutual verification, we use another SLM as a discriminator to augment the MCTS process, mutually verifying the correctness of each trajectory with the generator SLM.

Prerequisites

Usage

rStar Generator

Here is an example to run rStar generator:

bash scripts/run_gsm8k_generator.sh

The script run_gsm8k_generator.sh includes several configurable parameters:

Make sure to adjust these parameters according to your requirements.

Evaluate rStar Generator

Here is an example to evalute the results of rStar generator

python eval_src/do_eval.py --dataset_name GSM8K --exp_dir_path <generator_output_folder>

rStar Discriminator

Here is an example to run rStar discriminator:

bash scripts/run_gsm8k_discriminator.sh

The script run_gsm8k_discriminator.sh includes several configurable parameters:

Results

Extensive experiments across five SLMs demonstrate rStar can effectively solve diverse reasoning problems. rStar boosts GSM8K accuracy from 12.51% to 63.91% for LLaMA2-7B, from 36.46% to 81.88% for Mistral-7B, from 74.53% to 91.13% for LLaMA3-8B-Instruct.

<p align="center"> <img src="assets/result.png" width="600px"> </p>

Citation

If you find our work helpful, please consider citing it:

@misc{qi2024mutual,
    title={Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers},
    author={Zhenting Qi and Mingyuan Ma and Jiahang Xu and Li Lyna Zhang and Fan Yang and Mao Yang},
    year={2024},
    eprint={2408.06195},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

Read More

rStar has been recommended as a key technique in Awesome LLM Strawberry (OpenAI o1). Click and read more relevant papers.