Home

Awesome

WorldMedQA-V: A Multilingual, Multimodal Medical Examination Dataset

logo

Overview

WorldMedQA-V is a multilingual and multimodal benchmarking dataset designed to evaluate vision-language models (VLMs) in healthcare contexts. The dataset includes medical examination questions from four countries—Brazil, Israel, Japan, and Spain—in both their original languages and English translations. Each multiple-choice question is paired with a corresponding medical image, enabling the evaluation of VLMs on multimodal data.

Key Features:

Dataset Details

The dataset aims to bridge the gap between real-world healthcare settings and AI evaluations, fostering more equitable, effective, and representative applications.

Data Structure

The dataset is provided in TSV format, with the following structure:

Example from Brazil:

Evaluate models/results:

results

Download and Usage

The dataset can be downloaded from Hugging Face datasets page. All code for handling and evaluating the dataset is available in the following repositories:

Where and How to start?: Google Colab Demo

Citation

Please cite this dataset as follows:

@misc{WorldMedQA-V2024,
      title={WorldMedQA-V: a multilingual, multimodal medical examination dataset for multimodal language models evaluation}, 
      author={João Matos and Shan Chen and Siena Placino and Yingya Li and Juan Carlos Climent Pardo and Daphna Idan and Takeshi Tohyama and David Restrepo and Luis F. Nakayama and Jose M. M. Pascual-Leone and Guergana Savova and Hugo Aerts and Leo A. Celi and A. Ian Wong and Danielle S. Bitterman and Jack Gallifant},
      year={2024},
      eprint={2410.12722},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2410.12722}, 
}