Home

Awesome

ChiMST

This github repository includes information about the ChiMST Corpus, which is based on our previous work ChiMed

🔥 News 🔥

We recently released a large language model for the Chinese medical domain named ChiMed-GPT, which is trained on ChiMed data. For more information, please visit our GitHub Repo.

The Copyright

The copyright of the ChiMed and ChiMST corpus belongs to 39ask. We release the ChiMST corpus based on our contract with 39ask.

Request the ChiMed and ChiMST Dataset

To request the ChiMed and ChiMST Corpus, please download the contract in this repository (English, Chinese), fill the request form, sign it, and send the request file to yhtian@uw.edu. We will send the link to download the corpus to the e-mail address provided in the request form within One Week if the request form meets our requirements.

Please read the following instructions before submitting your request form:

Citation

If you use the ChiMed corpus, please cite the following paper (Note: the ChiMed Corpus is larger than the dataset used in this paper).

@inproceedings{tian-etal-2019-chimed,
    title = "ChiMed: A Chinese Medical Corpus for Question Answering",
    author = "Tian, Yuanhe and Ma, Weicheng and Xia, Fei and Song, Yan",
    booktitle = "Proceedings of the 18th BioNLP Workshop and Shared Task",
    month = aug,
    year = "2019",
    address = "Florence, Italy",
    pages = "250--260",
}

If you use the ChiMST corpus, please cite the following paper.

@InProceedings{tian-EtAl:2022:LREC1,
  title = "ChiMST: A Chinese Medical Corpus for Word Segmentation and Medical Term Recognition",
  author = "Tian, Yuanhe and Qin, Han and Xia, Fei and  Song, Yan",
  booktitle = "Proceedings of the Language Resources and Evaluation Conference",
  month = "June",
  year = "2022",
  address = "Marseille, France",
  pages = "5654--5664",
}