Awesome

GPT-Sentinel: Distinguishing Human and ChatGPT Generated Content

This repository is no longer actively maintained. Please refer to our latest followup work:

Paper: Token Prediction as Implicit Classification to Identify LLM-Generated Text
Codebase: T5-Sentinel-public

Overview

:page_facing_up: Link to Paper (arXiv) | :floppy_disk: Link to Dataset | :package: Link to Checkpoint

This repository is the codebase for paper GPT-Sentinel: Distinguishing Human and ChatGPT Generating Content.

We collect and publish OpenGPTText - a high quality dataset with approximately 30,000 text sample rephrased by gpt-3.5-turbo (ChatGPT).
We construct two detectors with different architectures - the RoBERTa-Sentinel and T5-Sentinel.
T5-Sentinel shows SOTA performance (98% accuracy) on OpenGPTText dataset

<p align="center"> <img width="718" alt="image" src="https://github.com/MarkChenYutian/GPT-Sentinel-public/assets/47029019/8a2125b6-3ba2-40ec-8310-52469f91aebd"> </p>