

ReCEval: Evaluating Reasoning Chains via Correctness and Informativeness

<img src="./assets/ReCEvalOverview.png" alt="teaser image" width="7500"/>


This code is written using PyTorch and HuggingFace's Transformer repo. Running ReCEval requires access to GPUs. The evaluation is quite light-weight, so one GPU should suffice. Please install Entailment Bank and GSM-8K datasets separately. For using human judgements datasets for GSM-8K and running baselines please follow the setup procedure in ROSCOE (preferably in a separate environment).


The simplest way to run our code is to start with a fresh environment.

conda create -n ReCEval python=3.9
source activate ReCEval
pip install -r requirements.txt

