Home

Awesome

KorNLU Datasets

This is the dataset repository for our paper KorNLI and KorSTS: New Benchmark Datasets for Korean Natural Language Understanding.

We introduce KorNLI and KorSTS, which are NLI and STS datasets in Korean.

KorNLI

Dataset Overview

KorNLITotalTrainDev.Test
Source-SNLI, MNLIXNLIXNLI
Translated by-MachineHumanHuman
# Examples950,354942,8542,4905,010
Avg. # words (premise)13.613.613.013.1
Avg. # words (hypothesis)7.17.26.86.8

Examples

ExampleEnglish TranslationLabel
P: 저는, 그냥 알아내려고 거기 있었어요.<br />H: 이해하려고 노력하고 있었어요.I was just there just trying to figure it out.<br />I was trying to understand.Entailment
P: 저는, 그냥 알아내려고 거기 있었어요.<br />H: 나는 처음부터 그것을 잘 이해했다.I was just there just trying to figure it out.<br />I understood it well from the beginning.Contradiction
P: 저는, 그냥 알아내려고 거기 있었어요.<br />H: 나는 돈이 어디로 갔는지 이해하려고 했어요.I was just there just trying to figure it out.<br />I was trying to understand where the money went.Neutral

KorSTS

Dataset Overview

KorSTSTotalTrainDev.Test
Source-STS-BSTS-BSTS-B
Translated by-MachineHumanHuman
# Examples8,6285,7491,5001,379
Avg. # words7.77.58.77.6

Examples

ExampleEnglish TranslationLabel
한 남자가 음식을 먹고 있다.<br />한 남자가 뭔가를 먹고 있다.A man is eating food.<br />A man is eating something.4.2
한 비행기가 착륙하고 있다.<br />애니메이션화된 비행기 하나가 착륙하고 있다.A plane is landing.<br />A animated airplane is landing.2.8
한 여성이 고기를 요리하고 있다.<br />한 남자가 말하고 있다.A woman is cooking meat.<br />A man is speaking.0.0

License

Creative Commons Attribution-ShareAlike license (CC BY-SA 4.0)

References

If you use KorNLI or KorSTS for research, please cite our paper:

@article{ham2020kornli,
  title={KorNLI and KorSTS: New Benchmark Datasets for Korean Natural Language Understanding},
  author={Ham, Jiyeon and Choe, Yo Joong and Park, Kyubyong and Choi, Ilji and Soh, Hyungjoon},
  journal={arXiv preprint arXiv:2004.03289},
  year={2020}
}