Home

Awesome

ChatGPT-Comparison-Detection Project 🔬

Official repository of paper "How Close is ChatGPT to Human Experts? Comparison Corpus, Evaluation, and Detection". Please star, watch, and fork our repo for the active updates!

See also→(📢 Feedback Space for Detectors please feel free to leave your feedback here! 请留下您宝贵的意见!)

<img width="600" alt="image" src="https://user-images.githubusercontent.com/37113676/212355768-5ef7a26a-7cc5-4c38-91dc-2ee249ec49d5.png">

Human ChatGPT Comparison Corpus (HC3) / 人类-ChatGPT 问答对比语料集

Yes, we propose the first Human vs. ChatGPT comparison corpus, named HC3.

我们提出了第一个 Human vs. ChatGPT 对比语料, 叫做 HC3.

<img width="520" alt="image" src="https://user-images.githubusercontent.com/37113676/213218672-e92b7036-a602-48c8-b70d-50ee1673bac8.png">

The first version of the HC3 datasets are now available on 🤗 Huggingface Datasets:

在中文社区,HC3 数据集也已在 ModelScope 上可用:

Train/Test splits & filtered versions of the paper, ref to Google Drive links in HC3/README.md.

Dataset Copyright

If the source datasets used in this corpus has a specific license which is stricter than CC-BY-SA, our products follow the same. If not, they follow CC-BY-SA license.

English SplitSourceSource LicenseNote
reddit_eli5ELI5BSD License
open_qaWikiQAPWC Custom
wiki_csaiWikipediaCC-BY-SA
medicineMedical DialogUnknownAsking
financeFiQAUnknownAsking by 📧
Chinese SplitSourceSource LicenseNote
open_qaWebTextQA & BaikeQAMIT license
baikeBaidu BaikeNone
nlpcc_dbqaNLPCC-DBQAUnknownAsking
medicineChinese Medical DialogueCC-BY-NC 4.0
financeFinanceZhidaoCC-BY 4.0
psychologyOn Baidu AI StudioCC0
lawLegalQAUnknownAsking

ChatGPT detectors / 内容检测器

image (Hosted on 🤗 Hugging Face Spaces)

We provide three kinds of detectors, all in Bilingual / 我们提供了三个版本的检测器,且都支持中英文:

在 modelscope 中文社区平台,三个版本的检测器也都可用:

The model weights are all available at 🤗 Hugging Face Models:

Model CheckpointsComment
chatgpt-detector-robertaTo detect a single piece of text
chatgpt-qa-detector-robertaTo detect a question-answer pair
chatgpt-detector-roberta-chinese检测单条文本,中文版
chatgpt-qa-detector-roberta-chinese检测一对QA文本,中文版

The English models are based on roberta-base. The Chinese models are based on hfl/chinese-roberta-wwm-ext.


Important Dates / 重要节点:

EventsDates
Project Launch / 项目启动2022-12-09 ✅
Comparison Data Collection / 对比数据收集2022-12-11 to Now 🏎️
Release ChatGPT Detector (Demo) / 检测器 Demo 发布2023-01-11 ✅
Models Release / 模型开源2023-01-18 ✅
Comparison Corpus Release / 语料集开源2023-01-18 ✅
Research Paper / 研究论文发布2023-01-19 ✅
......

Citation

Checkout this paper arxiv: 2301.07597

@article{guo-etal-2023-hc3,
    title = "How Close is ChatGPT to Human Experts? Comparison Corpus, Evaluation, and Detection",
    author = "Guo, Biyang  and
      Zhang, Xin  and
      Wang, Ziyuan  and
      Jiang, Minqi  and
      Nie, Jinran  and
      Ding, Yuxuan  and
      Yue, Jianwei  and
      Wu, Yupeng",
    journal={arXiv preprint arxiv:2301.07597}
    year = "2023",
}

Our Story... / 背景故事

On December 9, 2022, which is 10 days after the launch of ChatGPT, we started this project, for two purposes:

  1. To create some open-source models for efficiently detecting ChatGPT-generated content;
  2. To collect a valuable human-ChatGPT comparison Q&A corpus, to facilitate releated research.

2022 年 12 月 9 日,也就是 ChatGPT 推出的第 10 天,我们开始了这个项目,为了两个目的:

  1. 做出一些开源模型工具来高效检测 ChatGPT 生成的内容;
  2. 收集一批有价值的人类和 ChatGPT 对比的中英双语问答语料,来助力相关学术研究。

Welcome to follow our project! We have released a preview of our ChatGPT detectors, and the models, dataset will be open-sourced in about a week. We look forward to receiving feedback from the community to help improve the models and make contributions to open academic research together:)<br> 欢迎关注我们项目,我们目前已经发布ChatGPT检测器预览版,并将于约一周内发布开源模型、数据集。期待得到广大群众的反馈,来帮助我们改进模型,为开放的学术研究一起做贡献!

About Us / 关于我们

We are a group of insignificant researchers (in the shadow of ChatGPT) hoping to do some significant work for the community. The team for this projects consists of PhD students and engineers from 6 universities/companies.<br> 我们是一群(在 ChatGPT 的阴影下)渺小的研究人员,但希望为社区做一些有意义的事。这个项目的团队由来自6所大学/公司的博士生和工程师组成。

Biyang GuoMinqi JiangZiyuan WangXin Zhang
<img src="https://avatars.githubusercontent.com/u/37113676?s=64&v=4" alt="" width="40"/><img src="https://avatars.githubusercontent.com/u/39890732?s=64&v=4" alt="" width="40"/><img src="https://avatars.githubusercontent.com/u/44188955?s=64&v=4" alt="" width="40"/><img src="https://avatars.githubusercontent.com/u/26690193?s=64&v=4" alt="" width="40"/>
Jinran NieYuxuan DingJianwei YueYupeng Wu
<img src="https://avatars.githubusercontent.com/u/27188419?s=64&v=4" alt="" width="40"/><img src="https://avatars.githubusercontent.com/u/16249556?s=70&v=4" alt="" width="40"/><img src="https://avatars.githubusercontent.com/u/23006855?s=64&v=4" alt="" width="40"/><img src="https://avatars.githubusercontent.com/u/44936809?s=64&v=4" alt="" width="40"/>