Awesome

REFINER: Reasoning Feedback on Intermediate Representations :rocket: (EACL 2024)

Official implementation of 📖 REFINER: Reasoning Feedback on Intermediate Representations 🔗 Blog Post

🔍 Contents

🌟 Overview
🌟 Method
🔥 Dependencies
🔥 Setup
🔥 Data
🔥 Models
🚩 Citation

Overview

This repo proposes REFINER, an interaction-based framework for natural language reasoning tasks 🔥. REFINER is a framework that refines LMs reasoning capabilities through feedback. Our work is the first to investigate how interacting with fine-grained reasoning feedback on intermediate reasoning steps impacts the performance of LMs on reasoning tasks.

Method

We propose to solve these tasks by forcing the model to generate intermediate hypotheses (z) and improving them via structured feedback. We introduce an interactive framework named REFINER, made of two separate models: (a) a CRITIC model trained to provide structured feedback on intermediate reasoning steps and (b) a GENERATOR model trained to solve the reasoning task by first generating intermediate reasoning steps. The core idea of REFINER is to exploit the interaction between the generator model and the critic model, where the generator’s intermediate reasoning steps are improved via structured feedback from the critic.

Dependencies

compatible with python 3.8
dependencies can be installed using requirements.txt
The codebase is built around Hugging Face ecosystem and wandb (for monitoring and experiment management).

Setup

Start by cloning the repository:

git clone git@github.com:debjitpaul/refiner.git

Install VirtualEnv using the following (optional):

$ [sudo] pip install virtualenv

Create and activate your virtual environment (optional):

$ virtualenv -p python3 venv
$ source venv/bin/activate

Install all the required packages:

$ pip install -r requirements.txt

Data

Data	Reference	Output	Description
Math Word Problem	📖 , 🗂️, 🔗	Math Equations (z) and Answers (y)	Generate an equation given a math word problem question
Sythethic Natural Language Reasoning	📖 , 🗂️, 🔗	Reasoning steps (z) and Conclusion (y)	This task requires the model to perform deductive reasoning and generate intermediate reasoning steps z and conclusions y using closed-world rules and facts.
Moral Stories	📖 , 🗂️, 🔗	Moral Norm (z) and Moral Action (y)	Given a context x consisting of a situation, an intention, and an immoral action, the model needs to generate the moral norm z and the moral action y

Models

Baseline

Train a baseline model using PPO.

Paper: 📖| Code: 🔗

REFINER

Train a Generator model without Critic in the loop (Warm Start).
Train a Critic model with negative instances and feedbacks.
Train the warm start generator model with critic in the loop. For training we used oracle critic.
Inference using trained critic model in the loop.
Training REFINER with Low-rank Adaptation of Large Language Models (LORA) 📖.

Train Generator

python3 src/scripts/finetune.py --training-file path_train_data --validation-file path_val_data --language-model google/flan-t5-base --model-dir flan_t5_large_model  --epochs 10 --batch-size 8

Train Critic

python3 src/scripts/finetune.py --training-file path_train_data --validation-file path_val_data --language-model google/flan-t5-base --model-dir flan_t5_large_model --epochs 10 --batch-size 8

Train REFINER

python3 src/scripts/train_refiner.py --training-file data/mwp/critique_train.json --validation-file data/mwp/critique_val.json --language-model google/flan-t5-base --model-dir flan_t5_large_model --critique_model-dir output_critique  --epochs 10 --batch-size 8 --number_turn 4

REFINER Inference

python3 src/scripts/test_predict.py --training-file data/mwp/critique_train.json --validation-file data/mwp/critique_val.json --language-model google/flan-t5-base --model-dir flan_t5_large_model --critique_model-dir output_critique  --epochs 10 --batch-size 8 --number_turn 4

Train REFINER with Lora

python3 src/scripts/test_predict.py --training-file data/mwp/critique_train.json --validation-file data/mwp/critique_val.json --language-model google/flan-t5-base --model-dir flan_t5_large_model --critique_model-dir output_critique --lora True --epochs 10 --batch-size 8 --number_turn 4

Citation

@misc{paul2023refiner,
  title={REFINER: Reasoning Feedback on Intermediate Representations},
  author={Paul, Debjit and Ismayilzada, Mete and Peyrard, Maxime and Borges, Beatriz and Bosselut, Antoine and West, Robert and Faltings, Boi},
  eprint={2304.01904},
  journal={arXiv preprint arXiv:2304.01904},
  url={https://arxiv.org/pdf/2304.01904.pdf},
  year={2023}
}