Home

Awesome

OCNLI: Original Chinese Natural Language Inference

OCNLI stands for Original Chinese Natural Language Inference. It is corpus for Chinese Natural Language Inference, collected following closely the procedures of MNLI, but with enhanced strategies aiming for more challenging inference pairs. We want to emphasize we did not use human/machine translation in creating the dataset, and thus our Chinese texts are original and not translated.

OCNLI has roughly 50k pairs for training, 3k for development and 3k for test. We only release the test data but not its labels. See our paper for details.

OCNLI is part of the CLUE benchmark.

OCNLI,即原生中文自然语言推理数据集,是第一个非翻译的、使用原生汉语的大型中文自然语言推理数据集。 OCNLI包含5万余训练数据,3千验证数据及3千测试数据。除测试数据外,我们将提供数据及标签。测试数据仅提供数据。OCNLI为中文语言理解基准测评(CLUE)的一部分。更多细节请参考我们的论文

Data format 数据格式

Our dataset is distributed in json format. Here's an example from OCNLI.dev:

{
"level":"medium",
"sentence1":"身上裹一件工厂发的棉大衣,手插在袖筒里",
"sentence2":"身上至少一件衣服",
"label":"entailment","label0":"entailment","label1":"entailment","label2":"entailment","label3":"entailment","label4":"entailment",
"genre":"lit","prem_id":"lit_635","id":0
}

where:

You will only need sentence1, sentence2 and label to train and evaluate. 训练和验证时仅需【sentence1】【sentence2】【label】

Data split 数据集切分

为了了解训练集大小对结果的影响,我们提供四个大小不同的训练集。OCNLI.train.3k, OCNLI.train.10k, OCNLI.train.30k均为 OCNLI.train.50k的子集。四种情况下的验证和测试集相同,均为OCNLI.dev, OCNLI.test.

We provide four training sets:

  1. OCNLI.train.50k: 50k data points (OCNLI.train in our paper)
  2. OCNLI.train.30k: filtered subset of OCNLI.train.50k with 30k data points (OCNLI.train.small in our paper)
  3. OCNLI.train.10k: 10k data points sampled from OCNLI.train.30k
  4. OCNLI.train.3k: 3k data points sampled from OCNLI.train.30k

We wanted to see the effect of training size and overlapping premises on the results. The results trained with the first two training sets are reported in our paper, along with the details about the splits. The last two sets are intended to mimic situations where annotated data are limited.

All training sets should be validated on the same dev and test sets.

Leaderboard 排行榜

OCNLI is part of the CLUE benchmark, which will hold a leaderboard here. You can submit your results on the test set there.

目前可以提交用OCNLI.train.50kOCNLI.train.30k训练后的测试结果。

注:提交格式:提交一个zip压缩包。里面需要包含如下文件: OCNLI_50k.json, OCNLI_30k.json

Baselines 基线模型及结果

Models

Please refer to https://github.com/CLUEbenchmark/OCNLI/blob/main/rep_baseline.md

Results

Accuracy on dev / test sets: mean accuracy across 5 runs (standard deviation). BERT: BERT_base, RoBERTa: RoBERTa_large_wwm. Check more details on the paper.

validation datamajorityCBOWBiLSTMESIMBERTRoBERTahuman
dev37.456.8 (0.4)60.5 (0.4)61.8 (0.5)74.5 (0.3)78.8 (1.0)na
test38.155.7 (0.5)59.2 (0.5)59.8 (0.4)72.2 (0.7)78.2 (0.7)90.3
validation dataBiLSTMBERTRoBERTahuman
dev58.7 (0.3)72.6 (0.9)77.4 (1.0)na
test57.0 (0.9)70.3 (0.9)76.4 (1.2)90.3
validation dataBERTRoBERTahuman
dev69.2 (0.5)75.2 (0.3)na
test67.0 (0.6)73.6 (0.5)90.3
validation dataBERTRoBERTahuman
dev64.4 (0.7)70.4 (0.6)na
test62.8 (0.7)69.5 (0.5)90.3

More details about OCNLI

sentence1sentence2sourcelabel
但是不光是中国,日本,整个东亚文化都有这个特点就是被权力影响很深有超过两个东亚国家有这个特点OCNLIE
完善加工贸易政策体贸易政策体系还有不足之处OCNLIE
咖啡馆里面对面坐的年轻男女也是上一代的故事,她已是过来人了男人和女人是背对背坐着的OCNLIC
今天,这一受人关注的会议终于在波恩举行这一会议原定于昨天举行OCNLIN
嗯,今天星期六我们这儿,嗯哼.昨天是星期天OCNLIC

Why not XNLI?

While XNLI has been helpful in multi-lingual NLI research, the quality of XNLI Chinese data is far from satisfactory; here are just a few bad examples we found when annotating 300 randomly sampled examples from XNLI dev:

sentence1sentence2sourcelabel
Louisa May Alcott和Nathaniel Hawthorne 住在Pinckney街道,而 那个被Oliver Wendell Holmes称为 “晴天街道 的Beacon Street街道住着有些喜欢自吹自擂的历史学家 William PrescottHawthorne住在Main Street上XNLI devC
看看东方的Passeig de Gracia,特别是Diputacie,Consell de Cent,Mallorca和Valancia,直到Mercat de la Concepcie市场市场出售大量的水果和蔬菜XNLI devN
Leisure Modern medicine and hygiene学说已经解决了过去占据我们免疫系统的大部分问题人类是唯一没有免疫系统的生物XNLI devC
政府,法律的batta, begar, chaprasi, dakoit, dakoity, dhan, dharna, kotwal, kotwali, panchayat, pottah, sabha所有的单词都很容易理解XNLI devC
下一阶段,中情局基地组织的负责人当时回忆说,他不认为他的职责是指导应该做什么或不应该做什么导演认为这完全取决于他XNLI devC

Related resources

TODO

Contributors

Hai Hu, Kyle Richardson, Liang Xu, Lu Li, Sandra Kuebler and Larry Moss

Acknowledgements

We greatly appreciate the hard work of our annotators, who are from the following universities: Xiamen University, Beijing Foreign Studies University, University of Electronic Science and Technology of China, and Beijing Normal University. We also want to thank Ruoze Huang, Zhaohong Wu, Jueyan Wu and Xiaojie Gong for helping us to find the annotators. This project is funded by Grant-in-Aid of Doctoral Research from Indiana University Graduate School and the CLUE benchmark.

License

Citation

Please cite the following paper if you use OCNLI in your research

@inproceedings{ocnli,
	title={OCNLI: Original Chinese Natural Language Inference},
	author={Hai Hu and Kyle Richardson and Liang Xu and Lu Li and Sandra Kuebler and Larry Moss},
	booktitle={Findings of EMNLP},
	year={2020},
	url={https://arxiv.org/abs/2010.05444}
}