Awesome
GPT-Sentinel: Distinguishing Human and ChatGPT Generated Content
This repository is no longer actively maintained. Please refer to our latest followup work:
- Paper: Token Prediction as Implicit Classification to Identify LLM-Generated Text
- Codebase: T5-Sentinel-public
Overview
:page_facing_up: Link to Paper (arXiv) | :floppy_disk: Link to Dataset | :package: Link to Checkpoint
This repository is the codebase for paper GPT-Sentinel: Distinguishing Human and ChatGPT Generating Content.
- We collect and publish OpenGPTText - a high quality dataset with approximately 30,000 text sample rephrased by
gpt-3.5-turbo
(ChatGPT). - We construct two detectors with different architectures - the RoBERTa-Sentinel and T5-Sentinel.
- T5-Sentinel shows SOTA performance (98% accuracy) on OpenGPTText dataset