Home

Awesome

CNSD

中文自然语言推理数据集(A large-scale Chinese Nature language inference and Semantic similarity calculation Dataset) 本数据及通过翻译加部分人工修正的方法,从英文原数据集生成,可以一定程度缓解中文自然语言推理和语义相似度计算数据集不够的问题。

News

论文地址(PrePrint)

arxiv 没有人邀请,只有野路子open了给大家参考下下0.0

A large-scale Chinese Nature language inference and Semantic similarity calculation Dataset

下载链接

数据规模

TrainDevTestSum
Chinese-SNLI550k10k10k570k
Chinese-MNLI390k12k13k415k
Chinese-QQP390k8k800k (without label)1.1m
Chinese-STS-B5.7k1.5k1.3k8.5k
Total1.3m31.5k824.3k2.1m

数据格式

实验结果

ModelChinese-SNLIChinese-SNLIChinese-MNLIChinese-MNLIChinese-QQPChinese-STS-BChinese-STS-B
DevTestMismatchedMatchedDevDevTest
Embed+add-attention74.4675.0563.2862.2572.56--
BiLSTM+self-attention81.1980.9669.4767.7981.4543.8741.24
DiSAN81.3281.4569.5468.1382.3244.2142.09
BERT87.3986.9579.7679.3989.08*53.8450.26

参考

A large annotated corpus for learning natural language inference

A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference

成员

Amazonhhh

Pluto

致谢

感谢腾讯云提供翻译服务

声明

本数据集只能用于学术研究,请勿用作商业