Home

Awesome

REFINER: Reasoning Feedback on Intermediate Representations :rocket: (EACL 2024)

Python 3.9 MIT License arXiv

Official implementation of πŸ“– REFINER: Reasoning Feedback on Intermediate Representations πŸ”— Blog Post

Image

πŸ” Contents

Overview

This repo proposes REFINER, an interaction-based framework for natural language reasoning tasks πŸ”₯. REFINER is a framework that refines LMs reasoning capabilities through feedback. Our work is the first to investigate how interacting with fine-grained reasoning feedback on intermediate reasoning steps impacts the performance of LMs on reasoning tasks.

Method

We propose to solve these tasks by forcing the model to generate intermediate hypotheses (z) and improving them via structured feedback. We introduce an interactive framework named REFINER, made of two separate models: (a) a CRITIC model trained to provide structured feedback on intermediate reasoning steps and (b) a GENERATOR model trained to solve the reasoning task by first generating intermediate reasoning steps. The core idea of REFINER is to exploit the interaction between the generator model and the critic model, where the generator’s intermediate reasoning steps are improved via structured feedback from the critic.

Dependencies

Setup

Start by cloning the repository:

git clone git@github.com:debjitpaul/refiner.git

Install VirtualEnv using the following (optional):

$ [sudo] pip install virtualenv

Create and activate your virtual environment (optional):

$ virtualenv -p python3 venv
$ source venv/bin/activate

Install all the required packages:

$ pip install -r requirements.txt

Data

DataReferenceOutputDescription
Math Word ProblemπŸ“– , πŸ—‚οΈ, πŸ”—Math Equations (z) and Answers (y)Generate an equation given a math word problem question
Sythethic Natural Language ReasoningπŸ“– , πŸ—‚οΈ, πŸ”—Reasoning steps (z) and Conclusion (y)This task requires the model to perform deductive reasoning and generate intermediate reasoning steps z and conclusions y using closed-world rules and facts.
Moral StoriesπŸ“– , πŸ—‚οΈ, πŸ”—Moral Norm (z) and Moral Action (y)Given a context x consisting of a situation, an intention, and an immoral action, the model needs to generate the moral norm z and the moral action y

Models

Baseline

Train a baseline model using PPO.

Paper: πŸ“–| Code: πŸ”—

REFINER

Train Generator

python3 src/scripts/finetune.py --training-file path_train_data --validation-file path_val_data --language-model google/flan-t5-base --model-dir flan_t5_large_model  --epochs 10 --batch-size 8

Train Critic

python3 src/scripts/finetune.py --training-file path_train_data --validation-file path_val_data --language-model google/flan-t5-base --model-dir flan_t5_large_model --epochs 10 --batch-size 8

Train REFINER

python3 src/scripts/train_refiner.py --training-file data/mwp/critique_train.json --validation-file data/mwp/critique_val.json --language-model google/flan-t5-base --model-dir flan_t5_large_model --critique_model-dir output_critique  --epochs 10 --batch-size 8 --number_turn 4

REFINER Inference

python3 src/scripts/test_predict.py --training-file data/mwp/critique_train.json --validation-file data/mwp/critique_val.json --language-model google/flan-t5-base --model-dir flan_t5_large_model --critique_model-dir output_critique  --epochs 10 --batch-size 8 --number_turn 4

Train REFINER with Lora

python3 src/scripts/test_predict.py --training-file data/mwp/critique_train.json --validation-file data/mwp/critique_val.json --language-model google/flan-t5-base --model-dir flan_t5_large_model --critique_model-dir output_critique --lora True --epochs 10 --batch-size 8 --number_turn 4

Citation

@misc{paul2023refiner,
  title={REFINER: Reasoning Feedback on Intermediate Representations},
  author={Paul, Debjit and Ismayilzada, Mete and Peyrard, Maxime and Borges, Beatriz and Bosselut, Antoine and West, Robert and Faltings, Boi},
  eprint={2304.01904},
  journal={arXiv preprint arXiv:2304.01904},
  url={https://arxiv.org/pdf/2304.01904.pdf},
  year={2023}
}