Home

Awesome

ArguGPT: evaluating, understanding and identifying argumentative essays generated by GPT models

If you are confused by any of the following contents or you have any suggestion, please contact us at argugpt@163.com. 【如果有任何疑问以及建议,欢迎通过邮箱联系我们。】

In this repo, you will see 【内容如下】:

And we make following resources public 【我们公布了以下资源】:

Introduction to ArguGPT corpus 语料清单

We compiled a 8k human-machine (4k human v.s. 4k machine) comparison argumentative essays. We also collected a 1k dataset for the out-of-distribution (OOD) test experiment, in which the essays are either generated by ood models or prompted by ood writing tasks. Sub-corpora are listed in the following table. 【ArguGPT包含了8k议论文文章,包括4k人类议论文和4k机器生成议论文。我们也另外收集了1k议论文,作为OOD测试集以验证分类器泛化能力(其中的议论文使用的作文题或生成式模型与ArguGPT中的不同),语料清单如下。】

sub-corpus# essays# tokensmean lensourceaccess
WECCL-human1,845450,657244SWECCL 2.0SWECCL (Wen & Wang, 2008)
WECCL-machine1,813442,531244GPT modelsReleased. See in data/argugpt folder
TOEFL-human1,680503,504299TOEFL11Purchased in LDC
TOEFL-machine1,635442,963270GPT modelsReleased. See in data/argugpt folder
GRE-human590341,495578GRE-prep materialsNo copyright to release
GRE-machine590268,640455GPT modelsReleased. See in data/argugpt folder
OOD-human500132,902265CLECCLEC (Gui & Yang, 2003)
OOD-machine500180,120360ChatGPT & four OOD modelsReleased. See in data/argugpt folder

Note that four OOD models are: gpt-4, claude-instant, bloomz-7b, and flan-t5-11b. More detailed information about the ArguGPT corpus can be seen in our paper. 【有关ArguGPT的更多信息,请参考我们的论文。】

Data split and baseline 数据划分及基准模型

We first split the data into train/dev/test sets. The test split of TOEFL essays are as well evaluated by human participants in the Turing test. Then we established baselines by training detectors based on SVMs and RoBERTa. Moreover, we conducted ablation study to see the effect of reducing training data points. Finally, we evaluated two detectors on our own test set, namely GPTZero and RoBERTa trained by Guo et al. (2023). 【我们将数据集进行了划分。其中,图灵测试(人类测评)中使用的数据为测试集中的托福作文。在此数据集上,我们训练了SVM和RoBERTa模型,作为该数据集的基准模型。此外,我们在不同大小的训练集上进行了训练。】

splitTOEFLWECCLGREtotal
train3,0582,7159806,753
dev300300100700
test300300100700

Accuracy of human evaluators on the TOEFL split of test set is only 64.65%, far lagging behind ML detectors/classifiers. 【对测试集中的托福文章进行图灵测试,人类参与者的准确率仅有64.65%,远低于基于机器学习的检测器。】

train datatest datamaj. bslnRoBERTaBest SVMGPTZeroGuo et al. (2023)
doc-alldoc test5099.3895.1498.8689.86
doc-50%doc test5099.7694.14--
doc-25%doc test5099.1493.86--
doc-10%doc test5097.6792.29--
doc-allpara test52.6274.5883.61--
para-allpara test52.6297.8890.5592.1179.95
doc-allsent test54.1849.7372.15-
sent-allsent test54.1893.848190.1071.44
doc-allood-ma test10097.0072.2053.4059.20
doc-allood-hu test10098.4794.80100.0099.00

Notes: We broke down the dataset of essays into sentences and paragraphs, trained the models from document-, paragraph-, and sentence-level, and evaluated these classifiers. 【我们把文章拆分为段落和句子,并且分别从文章、段落和句子的层次训练了模型,并进行了一些测试。】

Team members 团队介绍

We are a group of students who are interested in language, linguistics, and NLP as well. Led by Hai Hu, an assistant professor from English Department of Shanghai Jiao Tong University (SJTU), whose research interest is computational linguistics, we hope to contribute something interesting to CL and NLP commuities from the perspective of language leaners. 【大家好,我们是一群热爱语言、同时也对NLP技术感兴趣的本科/研究生,团队由交大外院的胡海老师指导。我们会从语言学习者的角度,进行一些有趣的实验与研究。如果有幸的话,希望我们的研究能够对计算语言学和NLP社群做出一些小小的贡献!】

Name姓名Affiliation所属机构Status
Hai Hu胡海SFL, SJTU上海交通大学外院Assistant Professor
Yiwen Zhang张伊文Amazon亚马逊Language Engineer
Shisen Yue岳士森SFL, SJTU上海交通大学外院Undergraduate
Wanyang Zhang章万扬SS, PKU北京大学软件与微电子学院Graduate
Xiaojing Zhao赵晓靖SFL, SJTU上海交通大学外院Graduate
Xinyuan Cheng程心远SFL, SJTU上海交通大学外院Undergraduate
Yikang Liu刘逸康SFL, SJTU上海交通大学外院Graduate
Ziyin Zhang张子殷SEIEE, SJTU上海交通大学电院Graduate

Citation

Please cite our work as

@misc{liu2023argugpt,
      title={ArguGPT: evaluating, understanding and identifying argumentative essays generated by GPT models}, 
      author={Yikang Liu and Ziyin Zhang and Wanyang Zhang and Shisen Yue and Xiaojing Zhao and Xinyuan Cheng and Yiwen Zhang and Hai Hu},
      year={2023},
      eprint={2304.07666},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}