


License: MIT


The official repository of "On the Detectability of ChatGPT Content: Benchmarking, Methodology, and Evaluation through the Lens of Academic Writing". You can find another version of our artifact at Zenodo. An early version of this project can be found here.

Table of Contents

Recommended Hardware

Package Installation


pip install -r requirements.txt

We recommend using a virtual environment, docker, or VM to avoid version conflicts. For example, to set up a virtual environment using virtualenv, install it using pip:

pip install virtualenv

Navigate to a desired folder, create a virtual environment, activate it, and install our list of packages as provided:

virtualenv venv
source venv/bin/activate
pip install -r requirements.txt


There are two versions of datasets:

  1. GPABenchmark.
  2. GPABench2.

We mainly use GPABench2 in our CCS 2024 submission.

About the Artifact

The files are separated into several parts for upload convenience. Please download and extract all parts into the same host folder (e.g., ./artifact_checkgpt/).

Description of the Datasets



For GPABench2, download CS, PHX, and HSS, and put them under a created folder "./GPABench2". For HUM Task 2 GPT-CPL, use the second half of each text.

Other Datasets used in this Paper:

Download these files and put them under CheckGPT_presaved_files:

Pre-trained Models:

Download. Place them under CheckGPT_presaved_files.

Environment Setup


pip install -r requirements.txt


To train or reuse the text, please extract features from the text beforehand (For development only. Not need for testing).

Feature Extraction

To turn text into features, use features.py.

python features.py {DOMAIN} {TASK} {PROMPT}

Features will be saved in the folder named embeddings. ATTENTION: Each file of saved features for 50,000 samples will be approximately 52GB.

For example, to fetch the features of GPT data in CS on Task 1 Prompt 3:

python features.py CS 1 3 --gpt 1

The saved features are named in this format: ./embeddings/CS/gpt_CS_task1_prompt3.h5

Likely, to fefetch the features of HUM data in CS on Task 1 Prompt 3:

python features.py CS 1 3 --gpt 0

The saved features are named in this format: ./embeddings/CS/ground_CS.h5 (Same for Task 1 and 3)

For Task 2 GPT-CPL, the ground data will be cut into halves. Only the second halves will be processed. An example of saved names is ground_CS_task2.h5.

Or you can name the desired sample size. For example, to get the first 1000 samples:

python features.py CS 1 3 --gpt 0 --number 1000

The saved features are named in this format: ./embeddings/CS_1000/gpt_CS_task1_prompt3.h5



To evaluate any single piece of input text, run and follow instructions:

python run_input.py

Testing on text files

To directly evaluate any json data file, run:

python validate_text.py {FILE_PATH} {MODEL_PATH} {IS_GPT_OR_NOT}

For example, if you want to test pre-trained model ../CheckGPT_presaved_files/saved_experiments/basic/CS_Task3_Prompt2/Best_CS_Task3.pth on ../GPABench2/CS/gpt_task3_prompt2.json or ../GPABench2/CS/ground.json:

python validate_text.py ../GPABench2/CS/gpt_task3_prompt2.json ../CheckGPT_presaved_files/saved_experiments/basic/CS_Task3_Prompt2/Best_CS_Task3.pth 1


python validate_text.py ../GPABench2/CS/ground.json ../CheckGPT_presaved_files/saved_experiments/basic/CS_Task3_Prompt2/Best_CS_Task3.pth 0

To run it on special dataset like GPT4, run

python validate_text.py ../CheckGPT_presaved_files/Additional_data/GPT4/chatgpt_cs_task3.json ../CheckGPT_presaved_files/saved_experiments/basic/CS_Task3_Prompt2/Best_CS_Task3.pth 1

Testing on pre-saved features

python dnn.py {DOMAIN} {TASK} {PROMPT} {EXP_ID} --pretrain 1 --test 1 --saved-model {MODEL_PATH}

To test the pretrained model ../CheckGPT_presaved_files/saved_experiments/basic/CS_Task3_Prompt2/Best_CS_Task3.pth on pre-save features ./embeddings/CS/gpt_task3_prompt2.h5 and ./embeddings/CS/ground.h5, run

python dnn.py CS 3 2 12345 --pretrain 1 --test 1 --saved-model ../CheckGPT_presaved_files/saved_experiments/basic/CS_Task3_Prompt2/Best_CS_Task3.pth

For features of small test data with 1000 samples:

python dnn.py CS_1000 3 2 12346 --pretrain 1 --test 1 --saved-model ../CheckGPT_presaved_files/saved_experiments/basic/CS_Task3_Prompt2/Best_CS_Task3.pth

Training on pre-saved features

python dnn.py {DOMAIN} {TASK} {PROMPT} {EXP_ID}

To train a model from scratch on CS Task 3 Prompt 2:

python dnn.py CS 3 2 12347

Ablation Study: use --modelid to use different model (0 for CheckGPT, 1 for RCH, 2 for MLP-Pool, 3 for CNN):

python dnn.py CS 3 2 12347 --modelid 1
python dnn.py CS 3 2 12347 --modelid 2
python dnn.py CS 3 2 12347 --modelid 3

Transfer Learning

python dnn.py {DOMAIN} {TASK} {PROMPT} {EXP_ID} --trans 1 --mdomain --mtask --mprompt --mid

At the beginning, it will also provide cross-validation (testing) result.

For example, to transfer from CS_Task3_Prompt1 to HSS_Task1_Prompt2, run:

python dnn.py HSS 1 2 12347 --trans 1 --mdomain CS --mtask 3 --mprompt 1 --mid 12346
python dnn.py HSS_500 1 2 12347 --trans 1 --mdomain CS_500 --mtask 3 --mprompt 1 --mid 12346

--mid indicates the pre-trained model in previous experiments (e.g., 12346 as we did above).