Home

Awesome

Academia Sinica Diverse MWP Dataset (ASDiv) V1.0

This repository provides ASDiv (a new diverse dataset in terms of both language patterns and problem types) for evaluating and developing MWP Solvers. It contains 2305 english Math Word Problems (MWPs), and is published in this paper "A Diverse Corpus for Evaluating and Developing English Math Word Problem Solvers".

Format

This dataset uses XML format for each MWP:

<Problem ID="nluds-0001" Grade="1" Source="http://www.k5learning.com">
	<Body>Seven red apples and two green apples are in the basket.</Body>
	<Question>How many apples are in the basket?</Question>
	<Solution-Type>Addition</Solution-Type>
	<Answer>9 (apples)</Answer>
	<Formula>7+2=9</Formula>
</Problem>

Each Problem cell includes the corresponding 3 attributes and 5 properties (Body, Question, Solution-Type, Answer, and Formula):

Multiple-Step MWPs

<p align="center"> <kbd> <img width="450" src="figs/expressiontree.png" /> </kbd> </p>

For multiple-step MWPs, the formula can be represented as an expression tree. The non-leaf nodes are operations and the leaf-nodes are operands. We denote the operation in the root node to be the Solution-type of this MWP, and the other operations on the non-leaf nodes are sub-types (in a depth-first travel manner) of the MWP and are denoted in the Subtype attribute of the Solution-type.

<Problem ID="nluds-2262" Grade="3" Source="http://www.commoncoresheets.com">
  <Body>Sam invited 9 friends to a birthday party, but 6 couldn't come. If he wanted to buy enough cupcakes so each person could have exactly 2,</Body>
  <Question>how many should he buy?</Question>
  <Solution-Type Subtype-1="Subtraction">Multiplication</Solution-Type>
  <Answer>6 (cupcakes)</Answer>
  <Formula>(9-6)*2=6</Formula>
</Problem>

Evaluation

We provide two Fold-ID sets that used in our N-folds cross-validation performance evaluation. One is for the arithmetic subset and another is used for the whole set, you can access them in the dataset/nfolds/asdiv-a and dataset/nfolds/asdiv-w sub-directories respectively.

Citation

This dataset is released under the CC BY-NC 4.0 license, and constructed by Natural Language Understanding laboratory, Institute of Information Science, Academia Sinica. If you find this work useful for your research, please cite our paper:

@inproceedings{miao-etal-2020-diverse,
  title={A Diverse Corpus for Evaluating and Developing English Math Word Problem Solvers},
  author={Miao, Shen-yun and Liang, Chao-Chun and Su, Keh-Yih},
  booktitle={Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics},
  pages={975--984},
  year={2020}
}

Issues

Please contact ccliang@iis.sinica.edu.tw for any suggestions, comments, and issues.