Awesome

NumGLUE

NumGLUE is a multi-task benchmark that evaluates the performance of AI systems on eight different tasks, that at their core require simple arithmetic understanding.

Dataset

NumGLUE has 8 tasks

Task 1: Commonsense + Arithmetic Reasoning
Task 2: Domain Specific + Arithmetic Reasoning
Task 3: Commonsense + Quantitative Comparison
Task 4: Fill-in-the-blanks Format
Task 5: Reading Comprehension (RC) + Explicit Numerical Reasoning
Task 6: Reading Comprehension (RC) + Implicit Numerical Reasoning
Task 7: Quantitative NLI
Task 8: Arithmetic Word Problems

Download data from ./data/. It contains the train, dev and test split. Note that the provided task types need to be only used for evaluating model performance across various tasks. They should not be used as additional information during model training, since one of the goal in this benchmark is to identify task types directly from data.

Baseline Model

We used numnetplus as the baseline model in NumGLUE. We use reading comprehension as the common format and convert questions of all tasks to the reading comprehension format.

For more details, please refer to our paper NumGLUE: A Suite of Fundamental yet Challenging Mathematical Reasoning Tasks

Feel free to cite us

@article{mishra2022numglue,
  title={NumGLUE: A Suite of Fundamental yet Challenging Mathematical Reasoning Tasks},
  author={Mishra, Swaroop and Mitra, Arindam and Varshney, Neeraj and Sachdeva, Bhavdeep and Clark, Peter and Baral, Chitta and Kalyan, Ashwin},
  journal={ACL},
  year={2022}
}

If you use the NumGLUE data, please cite the source dataset papers. The full bibtex of source dataset papers is here.