Awesome
ChatGPT-Comparison-Detection Project 🔬
Official repository of paper "How Close is ChatGPT to Human Experts? Comparison Corpus, Evaluation, and Detection". Please star, watch, and fork our repo for the active updates!
See also→(📢 Feedback Space for Detectors please feel free to leave your feedback here! 请留下您宝贵的意见!)
<img width="600" alt="image" src="https://user-images.githubusercontent.com/37113676/212355768-5ef7a26a-7cc5-4c38-91dc-2ee249ec49d5.png">Human ChatGPT Comparison Corpus (HC3) / 人类-ChatGPT 问答对比语料集
Yes, we propose the first Human vs. ChatGPT comparison corpus, named HC3.
我们提出了第一个 Human vs. ChatGPT 对比语料, 叫做 HC3.
<img width="520" alt="image" src="https://user-images.githubusercontent.com/37113676/213218672-e92b7036-a602-48c8-b70d-50ee1673bac8.png">The first version of the HC3 datasets are now available on 🤗 Huggingface Datasets:
在中文社区,HC3 数据集也已在 ModelScope 上可用:
Train/Test splits & filtered versions of the paper, ref to Google Drive links in HC3/README.md.
Dataset Copyright
If the source datasets used in this corpus has a specific license which is stricter than CC-BY-SA, our products follow the same. If not, they follow CC-BY-SA license.
English Split | Source | Source License | Note |
---|---|---|---|
reddit_eli5 | ELI5 | BSD License | |
open_qa | WikiQA | PWC Custom | |
wiki_csai | Wikipedia | CC-BY-SA | |
medicine | Medical Dialog | Unknown | Asking |
finance | FiQA | Unknown | Asking by 📧 |
Chinese Split | Source | Source License | Note |
---|---|---|---|
open_qa | WebTextQA & BaikeQA | MIT license | |
baike | Baidu Baike | None | |
nlpcc_dbqa | NLPCC-DBQA | Unknown | Asking |
medicine | Chinese Medical Dialogue | CC-BY-NC 4.0 | |
finance | FinanceZhidao | CC-BY 4.0 | |
psychology | On Baidu AI Studio | CC0 | |
law | LegalQA | Unknown | Asking |
ChatGPT detectors / 内容检测器
(Hosted on 🤗 Hugging Face Spaces)
We provide three kinds of detectors, all in Bilingual / 我们提供了三个版本的检测器,且都支持中英文:
- QA version / 问答版: detect whether an answer is generated by ChatGPT for certain question, using PLM-based classifiers / 判断某个问题的回答是否由ChatGPT生成,使用基于PTM的分类器来开发;
- Sinlge-text version / 独立文本版: detect whether a piece of text is ChatGPT generated, using PLM-based classifiers / 判断单条文本是否由ChatGPT生成,使用基于PTM的分类器来开发;
- Linguistic version / 语言学版: detect whether a piece of text is ChatGPT generated, using linguistic features / 判断单条文本是否由ChatGPT生成,使用基于语言学特征的模型来开发;
在 modelscope 中文社区平台,三个版本的检测器也都可用:
The model weights are all available at 🤗 Hugging Face Models:
Model Checkpoints | Comment |
---|---|
chatgpt-detector-roberta | To detect a single piece of text |
chatgpt-qa-detector-roberta | To detect a question-answer pair |
chatgpt-detector-roberta-chinese | 检测单条文本,中文版 |
chatgpt-qa-detector-roberta-chinese | 检测一对QA文本,中文版 |
The English models are based on roberta-base. The Chinese models are based on hfl/chinese-roberta-wwm-ext.
Important Dates / 重要节点:
Events | Dates |
---|---|
Project Launch / 项目启动 | 2022-12-09 ✅ |
Comparison Data Collection / 对比数据收集 | 2022-12-11 to Now 🏎️ |
Release ChatGPT Detector (Demo) / 检测器 Demo 发布 | 2023-01-11 ✅ |
Models Release / 模型开源 | 2023-01-18 ✅ |
Comparison Corpus Release / 语料集开源 | 2023-01-18 ✅ |
Research Paper / 研究论文发布 | 2023-01-19 ✅ |
... | ... |
Citation
Checkout this paper arxiv: 2301.07597
@article{guo-etal-2023-hc3,
title = "How Close is ChatGPT to Human Experts? Comparison Corpus, Evaluation, and Detection",
author = "Guo, Biyang and
Zhang, Xin and
Wang, Ziyuan and
Jiang, Minqi and
Nie, Jinran and
Ding, Yuxuan and
Yue, Jianwei and
Wu, Yupeng",
journal={arXiv preprint arxiv:2301.07597}
year = "2023",
}
Our Story... / 背景故事
On December 9, 2022, which is 10 days after the launch of ChatGPT, we started this project, for two purposes:
- To create some open-source models for efficiently detecting ChatGPT-generated content;
- To collect a valuable human-ChatGPT comparison Q&A corpus, to facilitate releated research.
2022 年 12 月 9 日,也就是 ChatGPT 推出的第 10 天,我们开始了这个项目,为了两个目的:
- 做出一些开源模型工具来高效检测 ChatGPT 生成的内容;
- 收集一批有价值的人类和 ChatGPT 对比的中英双语问答语料,来助力相关学术研究。
Welcome to follow our project! We have released a preview of our ChatGPT detectors, and the models, dataset will be open-sourced in about a week. We look forward to receiving feedback from the community to help improve the models and make contributions to open academic research together:)<br> 欢迎关注我们项目,我们目前已经发布ChatGPT检测器预览版,并将于约一周内发布开源模型、数据集。期待得到广大群众的反馈,来帮助我们改进模型,为开放的学术研究一起做贡献!
About Us / 关于我们
We are a group of insignificant researchers (in the shadow of ChatGPT) hoping to do some significant work for the community. The team for this projects consists of PhD students and engineers from 6 universities/companies.<br> 我们是一群(在 ChatGPT 的阴影下)渺小的研究人员,但希望为社区做一些有意义的事。这个项目的团队由来自6所大学/公司的博士生和工程师组成。
Biyang Guo | Minqi Jiang | Ziyuan Wang | Xin Zhang |
<img src="https://avatars.githubusercontent.com/u/37113676?s=64&v=4" alt="" width="40"/> | <img src="https://avatars.githubusercontent.com/u/39890732?s=64&v=4" alt="" width="40"/> | <img src="https://avatars.githubusercontent.com/u/44188955?s=64&v=4" alt="" width="40"/> | <img src="https://avatars.githubusercontent.com/u/26690193?s=64&v=4" alt="" width="40"/> |
Jinran Nie | Yuxuan Ding | Jianwei Yue | Yupeng Wu |
<img src="https://avatars.githubusercontent.com/u/27188419?s=64&v=4" alt="" width="40"/> | <img src="https://avatars.githubusercontent.com/u/16249556?s=70&v=4" alt="" width="40"/> | <img src="https://avatars.githubusercontent.com/u/23006855?s=64&v=4" alt="" width="40"/> | <img src="https://avatars.githubusercontent.com/u/44936809?s=64&v=4" alt="" width="40"/> |