


医疗NLP领域 评测/比赛,数据集,论文和预训练模型资源汇总。

Summary of medical NLP evaluations/competitions, datasets, papers and pre-trained models.

<p> <a href="https://github.com/FreedomIntelligence/Medical_NLP"><img src=https://cdn.rawgit.com/sindresorhus/awesome/d7305f38d29fed78fa85652e3a63e154dd8e8829/media/badge.svg ></a> <a href="https://github.com/FreedomIntelligence/Medical_NLP"><img src=https://img.shields.io/github/forks/FreedomIntelligence/Medical_NLP.svg?style=social ></a> <a href="https://github.com/FreedomIntelligence/Medical_NLP"><img src=https://img.shields.io/github/stars/FreedomIntelligence/Medical_NLP.svg?style=social ></a> <a href="https://github.com/FreedomIntelligence/Medical_NLP"><img src=https://img.shields.io/github/watchers/FreedomIntelligence/Medical_NLP.svg?style=social ></a> </p>



1. 评测

1.1 中文医疗基准测评:CMB / CMExam / PromptCBLUE

1.2 英文医疗基准测评:

2. 比赛

2.1 正在进行的比赛

2.2 已经结束的比赛

2.2.1 英文比赛

2.2.2 中文比赛

3. LLM数据集

3.1 中文

3.2 英文

4. VLM数据集

<!-- 参考: https://github.com/lab-rasool/Awesome-Medical-VLMs-and-Datasets https://github.com/openmedlab/Awesome-Medical-Dataset 多模态病理数据集:https://github.com/FreedomIntelligence/Medical_NLP/blob/master/images/pathology_datasets.jpg -->
MedTrinity-25Mlinklink25 million images10 modalities65 diseasesVQAEN
LLaVA-Medlinklink630k imagesVQAEN
Chinese-LLaVA-Med-link60k imagesVQAZH
HuatuoGPT-Visionlinklink647k imagesVQAEN
MedVidQAlinklink7k videosVQAEN
ChiMed-VLlinklink1M imagesVQAENZH
RadFMlinklink16M images5000 diseasesVQAEN2D/3D
BiomedParseDatalinklink6.8 million image-mask-description45 biomedical image segmentation datasets9 modalitiesEN2D
OmniMedVQAlinklink118,010 images12 modalities2D20 human anatomical regions
PreCTlinklink160K volumes42M slices3DCT
GMAI-VL-5.5Mlinklink5.5m image and text219 specialized medical imaging datasets2DVQA
SA-Med2D-20Mlinklink4.6 million 2D medical images and 19.7 million corresponding masks2DEN
IMIS-Benchlinklink6.4 million images, 273.4 million masks (56 masks per image), 14 imaging modalities, and 204 segmentation targetsEN
5. 开源预训练模型

5.1 医疗PLM

5.2 医疗LLM

5.2.1 多语言医疗大模型

5.2.2 中文医疗大语言模型

5.2.3 英文医疗大语言模型

5.3 医疗VLM


5.4 医疗VLM Benchmark

6. 相关论文

6.1 后ChatGPT时代 可能有帮助的论文

  1. 大型语言模型编码临床知识 论文地址:https://arxiv.org/abs/2212.13138

  2. ChatGPT在USMLE上的表现:使用大型语言模型进行 AI 辅助医学教育的潜力 论文地址:https://journals.plos.org/digitalhealth/article?id=10.1371/journal.pdig.0000198

  3. 对 ChatGPT 的医疗建议进行(图灵)测试 论文地址:https://arxiv.org/abs/2301.10035

  4. Toolformer:语言模型可以自学使用工具 论文地址:https://arxiv.org/abs/2302.04761

  5. 检查你的事实并再试一次:利用外部知识和自动反馈改进大型语言模型 论文地址:https://arxiv.org/abs/2302.12813

  6. GPT-4 在医学挑战问题上的能力 论文地址:https://arxiv.org/abs/2303.13375

6.2 综述类文章

  1. 生物医学领域的预训练语言模型:系统调查 论文地址
  2. 医疗保健深度学习指南 论文地址 nature medicine发表的综述
  3. 医疗保健领域大语言模型综述 论文地址

6.3 特定任务文章


  1. Transfer Learning from Medical Literature for Section Prediction in Electronic Health Records 论文地址
  2. MUFASA: Multimodal Fusion Architecture Search for Electronic Health Records 论文地址


  1. Leveraging Dependency Forest for Neural Medical Relation Extraction 论文地址


  1. Learning a Health Knowledge Graph from Electronic Medical Records 论文地址


  1. Evaluation and accurate diagnoses of pediatric diseases using artificial intelligence 论文地址


  1. Medical Entity Linking using Triplet Network 论文地址
  2. A Generate-and-Rank Framework with Semantic Type Regularization for Biomedical Concept Normalization 论文地址
  3. Deep Neural Models for Medical Concept Normalization in User-Generated Texts 论文地址

6.4 会议索引


  1. A Generate-and-Rank Framework with Semantic Type Regularization for Biomedical Concept Normalization 论文地址
  2. Biomedical Entity Representations with Synonym Marginalization 论文地址
  3. Document Translation vs. Query Translation for Cross-Lingual Information Retrieval in the Medical Domain 论文地址
  4. MIE: A Medical Information Extractor towards Medical Dialogues 论文地址
  5. Rationalizing Medical Relation Prediction from Corpus-level Statistics 论文地址

AAAI2020 医学NLP相关论文列表

  1. On the Generation of Medical Question-Answer Pairs 论文地址
  2. LATTE: Latent Type Modeling for Biomedical Entity Linking 论文地址
  3. Learning Conceptual-Contextual Embeddings for Medical Text 论文地址
  4. Understanding Medical Conversations with Scattered Keyword Attention and Weak Supervision from Responses 论文地址
  5. Simultaneously Linking Entities and Extracting Relations from Biomedical Text without Mention-level Supervision 论文地址
  6. Can Embeddings Adequately Represent Medical Terminology? New Large-Scale Medical Term Similarity Datasets Have the Answer! 论文地址

EMNLP2020 医学NLP相关论文列表

  1. Towards Medical Machine Reading Comprehension with Structural Knowledge and Plain Text 论文地址
  2. MedDialog: Large-scale Medical Dialogue Datasets 论文地址
  3. COMETA: A Corpus for Medical Entity Linking in the Social Media 论文地址
  4. Biomedical Event Extraction as Sequence Labeling 论文地址
  5. FedED: Federated Learning via Ensemble Distillation for Medical Relation Extraction 论文地址 论文解析:FedED:用于医学关系提取的联邦学习(基于融合蒸馏)
  6. Infusing Disease Knowledge into BERT for Health Question Answering, Medical Inference and Disease Name Recognition 论文地址
  7. A Knowledge-driven Generative Model for Multi-implication Chinese Medical Procedure Entity Normalization 论文地址
  8. BioMegatron: Larger Biomedical Domain Language Model 论文地址
  9. Querying Across Genres for Medical Claims in News 论文地址
7. 开源工具包

  1. 分词工具:PKUSEG 项目地址 项目说明: 北京大学推出的多领域中文分词工具,支持选择医学领域。
8. 工业级产品解决方案

  1. 灵医智慧

  2. 左手医生

  3. 医渡云研究院-医学自然语言处理

  4. 百度-医学文本结构化

  5. 阿里云-医学自然语言处理

9. blog分享

  1. Alpaca:一个强大的开源指令跟随模型
  2. 医疗领域构建自然语言处理系统的经验教训
  3. 大数据时代的医学公共数据库与数据挖掘技术简介
  4. 从ACL 2021中看NLP在医疗领域应用的发展,附资源下载
10. 友情链接

  1. awesome_Chinese_medical_NLP
  2. 中文NLP数据集搜索
  3. medical-data(海量医疗相关数据)
  4. 天池数据集(其中包含多个医疗NLP数据集)
