Home

Awesome

VisualMRC

VisualMRC is a visual machine reading comprehension dataset that proposes a task: given a question and a document image, a model produces an abstractive answer.

Figure 1 from paper

You can find more details, analyses, and baseline results in our paper. You can cite it as follows:

<pre> @inproceedings{VisualMRC2021, author = {Ryota Tanaka and Kyosuke Nishida and Sen Yoshida}, title = {VisualMRC: Machine Reading Comprehension on Document Images}, booktitle = {AAAI}, year = {2021} } </pre>

Statistics

Get Started

If you want to use the dataset including ground-truth annotations, please contact me at ryouta.tanaka.rg@hco.ntt.co.jp. Please let us know your institution, name, and purpose.

Dataset Format

<pre> id: "image id", url: "URL", screenshot_filename: "screenshot file name", image_filename: "image file name", bounding_boxes: [ { id: "bounding box id", structure: "semantic class of the bounding box", shape: { x: "INT, Top left x coordinate of the bounding box", y: "INT, Top left y coordinate of the bounding box ", width: "INT, Width of the ROI bounding box", height: "INT, Height of the bounding box", } ocr_info: [ { word: "OCR token", confidence: "Confiden score produced by tesseract", bbox: { x: "INT, Top left x coordinate of the OCR bounding box", y: "INT, Top left y coordinate of the OCR bounding box ", width: "INT, Width of the OCR bounding box", height: "INT, Height of the OCR bounding box", } } ] } ] qa_data:[ { question: { text: "question" } answer: { text: "answer", relevant: ["relevant bounding boxes that need to answer the question"] } } ] </pre>