Home

Awesome

NLPCC2023 Shared Task 1: Chinese Grammatical Error Correction

Task Introduction

Chinese Grammatical Error Correction (CGEC) aims to automatically correct grammatical errors that violate language rules and converts the noisy input texts to clean output texts. The widely used benchmarks are derived from the grammatical errors made by foreign Chinese learners (i.e., L2 learners). The gap between the language usage habits of L2 learners and Chinese native speakers makes the performance of the CGEC models in real scenarios unpredictable. This task focuses on correcting grammatical errors made by Chinese native speakers, which will be a challenging benchmark and a meaningful resource to facilitate further development of CGEC.

Updates

All updates about this shared task will be posted on this page.

Important Dates

Leaderboard

<table> <thead> <tr> <th rowspan="2">Rank</th> <th rowspan="2">System Name</th> <th colspan="3">MaxMatch</th> <th colspan="3">ChERRANT</th> <th rowspan="2">Score</th> </tr> <tr> <th>Precision</th> <th>Recall</th> <th>F0.5</th> <th>Precision</th> <th>Recall</th> <th>F0.5</th> </tr> </thead> <tbody> <tr> <td>1</td> <td>HW_TSC_nlpcc2023_cgec</td> <td>56.23</td> <td>33.3</td> <td>49.42</td> <td>50.95</td> <td>31.29</td> <td>45.26</td> <td>47.34</td> </tr> <tr> <td>2</td> <td>鱼饼啾啾Plus</td> <td>57.08</td> <td>12.94</td> <td>33.94</td> <td>54.5</td> <td>13.06</td> <td>33.34</td> <td>33.64</td> </tr> <tr> <td>3</td> <td>CUHK_SU</td> <td>38.82</td> <td>15.58</td> <td>29.9</td> <td>45.4</td> <td>15.15</td> <td>32.45</td> <td>31.175</td> </tr> <tr> <td>4</td> <td>CGEC++</td> <td>24.14</td> <td>7.35</td> <td>16.57</td> <td>21.11</td> <td>7.32</td> <td>15.33</td> <td>15.95</td> </tr> <tr> <td>5</td> <td>zhao_jia</td> <td>17.19</td> <td>14.78</td> <td>16.65</td> <td>13.51</td> <td>13.43</td> <td>13.49</td> <td>15.07</td> </tr> <tr> <td>6</td> <td>ZZUNLP</td> <td>4.31</td> <td>0.99</td> <td>2.58</td> <td>2.21</td> <td>0.59</td> <td>1.43</td> <td>2.005</td> </tr> <tr> <td>7</td> <td>YNU-Janko</td> <td>0.51</td> <td>6.36</td> <td>0.62</td> <td>0.93</td> <td>2.34</td> <td>1.06</td> <td>0.84</td> </tr> </tbody> </table>

Data Description & Rules

We provide a CGEC benchmark named NaCGEC, which focuses on grammatical errors made by native Chinese speakers. In this task, the benchmark data is split into two parts: Validation and Test.

For model training, only the data provided by this link is allowed to be used as supervised data, i.e, parallel data, which includes Lang8, HSK, CGED, MuCGEC, YACLC and CTC2021, in this shared task. When using these data, please follow the rules set by the original data publisher. Meanwhile, for unsupervised data, any corpus publicly available on the web is allowed to be used. Based on unsupervised data, participants can use any data augmentation methods, such as our work or other methods, to construct pseudo-parallel data for model training.

For more information related to this dataset, please refer to our paper: Linguistic Rules-Based Corpus Generation for Native Chinese Grammatical Error Correction. If there are any differences between the paper and this page, the content of this page should prevail.

Submission & Evaluation

For submission, the following materials should be packaged as one zip file and sent to masr21@mails.tsinghua.edu.cn:

For evaluation, we employ both word-based metrics and char-based span-level metrics. For word-based metrics, an output sentence should be segmented into words using THULAC toolkit, and then we utilize MaxMatch ($\text{M}^2$) scorer to compute Precision, Recall and $\text{F}{0.5}$ between the output sentence and gold edits. For char-based span-level metrics, we use ChERRANT to obtain the evaluation results. The final score is the average of $\text{F}{0.5}$ scores obtained by word-based and char-based metrics, i.e., $\text{score}=0.5\cdot\text{F}{0.5}\text{(word)}+0.5\cdot\text{F}{0.5}\text{(char)}$.

The top 3 participating teams will be certificated by NLPCC and CCF-NLP.

Participants

Team IDOrganizationSystem Name
1Natural language processing laboratory of zhengzhou universityZZUNLP
2MOE Key Laboratory of Computational Linguistics, School of Computer Science, Peking UniversityCGEC先遣队
3NanKai University, College of Computer Science, DBISNLP Beginner
4Harbin Institute of Technology, Shenzhen ; School of Computer Science and Technology ; HappyTrans@HITszHappyTrans
5School of Computer Science & Technology, Beijing Institute of Technologyzhao jia
6Wangxuan Institute of Computer Technology, Peking UniversityPKU-WICT
7杭州十域科技有限公司jojolee
8北京大学鱼饼啾啾Plus
9Beihang UniversityBUAA NLP
10上海哔哩哔哩科技有限公司chole
11Beijing Normal University, School of Artificial Intelligence, The Language and Character Resources Research Center of Beijing Normal UniversityLrt123
12Yunnan UniversityYNU-HPCC
13Huawei Translation Services CenterHW_TSC_nlpcc2023_cgec
14Zhongyuan University of Technologyzutnlp-wuyanbo
15Yunnan UniversityYNU-Janko
16NLP, School of Computer Science and Technology, Soochow University & NLP, School of Data Science, The Chinese University of Hong Kong, ShenzhenCUHK_SU
17Social Computing Lab , Southeast UniversityCGEC++
18State Key Lab of Communication Content Cognition, People’s Daily Onlinecc414
19Fudan Universitycisl-nlp
20Ant GroupLastonestands
21School of Information Science and Technology, Guangdong University of Foreign Studies; School of Computer Science and Technology, Guangdong University of TechnologyBERT 4EVER
22School of Data Science and Engineering, East China Normal UniversityGGBond
23Text Machine Translation Lab, Huawei Technologies Co., Ltd.HW-TSC

Contact & Citation

If your publication employs our dataset, please cite the following article:

@inproceedings{ma2022linguistic,
  title={Linguistic Rules-Based Corpus Generation for Native Chinese Grammatical Error Correction},
  author={Ma, Shirong and Li, Yinghui and Sun, Rongyi and Zhou, Qingyu and Huang, Shulin and Zhang, Ding and Yangning, Li and Liu, Ruiyang and Li, Zhongli and Cao, Yunbo and others},
  booktitle = {Findings of the Association for Computational Linguistics: EMNLP 2022},
  year={2022}
}

If you have any questions about this task, please email to masr21@mails.tsinghua.edu.cn (C.C. liyinghu20@mails.tsinghua.edu.cn, zheng.haitao@sz.tsinghua.edu.cn).