Home

Awesome

机器阅读理解预训练模型及代码开源

*********************** 更新 ***********************

Contents

基于大规模MRC数据再训练

此库发布的再训练模型,在 阅读理解/分类 等任务上均有大幅提高<br/> (已有多位小伙伴在 Dureader、法研杯、医疗问答 等多个比赛中取得top5的好成绩😁)

模型/数据集Dureader-2021tencentmedical
F1-scoreAccuracy
dev / A榜test-1
macbert-large (哈工大预训练语言模型)65.49 / 64.2782.5
roberta-wwm-ext-large (哈工大预训练语言模型)65.49 / 64.2782.5
macbert-large (ours)70.45 / 68.1383.4
roberta-wwm-ext-large (ours)68.91 / 66.9183.1
----- 使用方法 -----
from transformers import AutoTokenizer, AutoModelForQuestionAnswering

model_name = "chinese_pretrain_mrc_roberta_wwm_ext_large" # "chinese_pretrain_mrc_macbert_large"

# Use in Transformers
tokenizer = AutoTokenizer.from_pretrained(f"luhua/{model_name}")
model = AutoModelForQuestionAnswering.from_pretrained(f"luhua/{model_name}")

# Use locally(通过 https://huggingface.co/luhua 下载模型及配置文件)
tokenizer = BertTokenizer.from_pretrained(f'./{model_name}')
model = AutoModelForQuestionAnswering.from_pretrained(f'./{model_name}')

仓库介绍

运行流程

脚本参数解释

一、数据 & 模型:
二、一键运行
sh train_bert.sh  # sh test_bert.sh
三、无答案问题

小小提示:

感谢

zhangxiaoyu zhongjialun huanghui nanfulai