Home

Awesome

Huatuo-26M

<p align="center"> 📃 <a href="https://arxiv.org/abs/2305.01526" target="_blank">Paper</a> • 🤗 <a href="https://huggingface.co/datasets/FreedomIntelligence/Huatuo26M-Lite" target="_blank">Huatuo-Lite</a> • 🤗 <a href="https://huggingface.co/datasets/FreedomIntelligence/huatuo_encyclopedia_qa" target="_blank">huatuo_encyclopedia_qa</a> • 🤗 <a href="https://huggingface.co/datasets/FreedomIntelligence/huatuo_knowledge_graph_qa" target="_blank">knowledge_graph_qa</a> • 🤗 <a href="https://huggingface.co/datasets/FreedomIntelligence/huatuo_consultation_qa" target="_blank">huatuo_consultation_qa</a> <br> <a href="README_zh.md"> 中文</a> | <a href="README.md"> English </p>

👩🏻‍⚕Introduction

📚Data Content

The Huatuo-26M dataset is collected and integrated from multiple sources, including:

Each question-answer pair in the dataset contains the following fields:

The following is the huatuo test set we used in the paper, which consists of random sampling of data from multiple sources.

🤖Data Usage

The Huatuo-26M dataset can be used for a variety of AI research and applications in the medical field, such as:

🚀Quick Start

To start using the Huatuo-26M dataset, you can follow the steps below:

import datasets
# part 1
knowledge_graph_dataset = datasets.load_dataset('FreedomIntelligence/huatuo_knowledge_graph_qa')
# part 2
encyclopedia_dataset = datasets.load_dataset('FreedomIntelligence/huatuo_encyclopedia_qa')
# part 3 (only url)
consultation_dataset = datasets.load_dataset('FreedomIntelligence/huatuo_consultation_qa')

# testdatasets (6k)
huatuo_testdatasets = datasets.load_dataset('FreedomIntelligence/huatuo26M-testdatasets')

👩🏻‍🔬Experiment Record

Benchmark

Application

🚁License

The Huatuo-26M dataset is licensed under Apache 2.0. Please make sure you have read and agreed to the license terms before using it.

📱Contact Us

If you have any questions or need help, please feel free to ask us via email (xidongw@163.com)or in the Issues section.


😁Citation

@misc{li2023huatuo26m,
      title={Huatuo-26M, a Large-scale Chinese Medical QA Dataset}, 
      author={Jianquan Li and Xidong Wang and Xiangbo Wu and Zhiyi Zhang and Xiaolong Xu and Jie Fu and Prayag Tiwari and Xiang Wan and Benyou Wang},
      year={2023},
      eprint={2305.01526},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}