Awesome
CheckGPT-v2
Description
The official repository of "On the Detectability of ChatGPT Content: Benchmarking, Methodology, and Evaluation through the Lens of Academic Writing". You can find another version of our artifact at Zenodo. An early version of this project can be found here.
Table of Contents
Recommended Hardware
- Disk: At least 10GB to store the models and datasets. An extra 52GB for each 50,000 samples of features (~2.2 TB in total for ./GPABench2).
- GPU: For CheckGPT: 6 GB Memory (for training) or 2 GB Memory (for inference). Need to adjust the batch size accordingly. For other benchmarked models in Sec 2.2: 11 GB Memory.
Package Installation
Run
pip install -r requirements.txt
We recommend using a virtual environment, docker, or VM to avoid version conflicts. For example, to set up a virtual environment using virtualenv, install it using pip:
pip install virtualenv
Navigate to a desired folder, create a virtual environment, activate it, and install our list of packages as provided:
virtualenv venv
source venv/bin/activate
pip install -r requirements.txt
Data
There are two versions of datasets:
- GPABenchmark.
- GPABench2.
We mainly use GPABench2 in our CCS 2024 submission.
About the Artifact
The files are separated into several parts for upload convenience. Please download and extract all parts into the same host folder (e.g., ./artifact_checkgpt/).
- CheckGPT.zip: the main folder of the CheckGPT code (./artifact_checkgpt/CheckGPT). embeddings is the folder for saving features. exp is the folder for saving results under different experiment IDs.
- CheckGPT_presaved_files.zip: pre-trained models and saved experiments (./artifact_checkgpt/CheckGPT_presaved_files).
- CS.zip, PHX.zip, HSS.zip: GPABench2 datasets. Please download and extract them into a newly created folder, "GPABench2" (./artifact_checkgpt/GPABench2).
- GPABenchmark.zip: GPABenchmark datasets (./artifact_checkgpt/GPABenchmark).
- scripts.zip: scripts for reproducing the results in the paper. Extract them into the main folder (./artifact_checkgpt/CheckGPT).
- README.md: this file.
Description of the Datasets
GPABenchmark:
- GPT example: ./GPABenchmark/CS_Task1/gpt.json (Computer Science, Task 1 GPT-WRI)
- HUM example: ./GPABenchmark/CS_Task1/hum.json
- Data structure: {PaperID}: {Abstract}
GPABench2:
- GPT example: ./GPABench2/PHX/gpt_task3_prompt4.json (Physics, Task 3 GPT-POL, Prompt 4)
- HUM example: ./GPABench2/PHX/ground.json
- Data structure: {Index}: { {"id"}: {PaperID}, {"title"}: {PaperTitle}, {"abstract"}: {Abstract} }
For GPABench2, download CS, PHX, and HSS, and put them under a created folder "./GPABench2". For HUM Task 2 GPT-CPL, use the second half of each text.
Other Datasets used in this Paper:
Download these files and put them under CheckGPT_presaved_files:
- Other Academic Writing Purposes (Section 5.4) (Available under CheckGPT_presaved_files/Additional_data/Other_purpose)
- Classic NLP Datasets (Section 5.4) (Available under CheckGPT_presaved_files/Additional_data/Classic_NLP)
- Advanced Prompt Engineering (Section 5.7) (Available under CheckGPT_presaved_files/Additional_data/Prompt_engineering)
- Sanitized GPT Output (Section 5.10) (Available under CheckGPT_presaved_files/Additional_data/Sanitized)
- GPT4 (Section 5.6 ) (Available under CheckGPT_presaved_files/Additional_data/GPT4)
Pre-trained Models:
Download. Place them under CheckGPT_presaved_files.
- Models trained on GPABenchmark (v1) can be accessed at Pretrained_models.
- Experiments in Section 5.2 and 5.3, including pre-trained models and training logs, can be found at saved_experiments/basic.
Environment Setup
Run
pip install -r requirements.txt
Features
To train or reuse the text, please extract features from the text beforehand (For development only. Not need for testing).
Feature Extraction
To turn text into features, use features.py.
python features.py {DOMAIN} {TASK} {PROMPT}
Features will be saved in the folder named embeddings. ATTENTION: Each file of saved features for 50,000 samples will be approximately 52GB.
For example, to fetch the features of GPT data in CS on Task 1 Prompt 3:
python features.py CS 1 3 --gpt 1
The saved features are named in this format: ./embeddings/CS/gpt_CS_task1_prompt3.h5
Likely, to fefetch the features of HUM data in CS on Task 1 Prompt 3:
python features.py CS 1 3 --gpt 0
The saved features are named in this format: ./embeddings/CS/ground_CS.h5 (Same for Task 1 and 3)
For Task 2 GPT-CPL, the ground data will be cut into halves. Only the second halves will be processed. An example of saved names is ground_CS_task2.h5.
Or you can name the desired sample size. For example, to get the first 1000 samples:
python features.py CS 1 3 --gpt 0 --number 1000
The saved features are named in this format: ./embeddings/CS_1000/gpt_CS_task1_prompt3.h5
Usage
On-the-fly
To evaluate any single piece of input text, run and follow instructions:
python run_input.py
Testing on text files
To directly evaluate any json data file, run:
python validate_text.py {FILE_PATH} {MODEL_PATH} {IS_GPT_OR_NOT}
For example, if you want to test pre-trained model ../CheckGPT_presaved_files/saved_experiments/basic/CS_Task3_Prompt2/Best_CS_Task3.pth on ../GPABench2/CS/gpt_task3_prompt2.json or ../GPABench2/CS/ground.json:
python validate_text.py ../GPABench2/CS/gpt_task3_prompt2.json ../CheckGPT_presaved_files/saved_experiments/basic/CS_Task3_Prompt2/Best_CS_Task3.pth 1
or
python validate_text.py ../GPABench2/CS/ground.json ../CheckGPT_presaved_files/saved_experiments/basic/CS_Task3_Prompt2/Best_CS_Task3.pth 0
To run it on special dataset like GPT4, run
python validate_text.py ../CheckGPT_presaved_files/Additional_data/GPT4/chatgpt_cs_task3.json ../CheckGPT_presaved_files/saved_experiments/basic/CS_Task3_Prompt2/Best_CS_Task3.pth 1
Testing on pre-saved features
python dnn.py {DOMAIN} {TASK} {PROMPT} {EXP_ID} --pretrain 1 --test 1 --saved-model {MODEL_PATH}
To test the pretrained model ../CheckGPT_presaved_files/saved_experiments/basic/CS_Task3_Prompt2/Best_CS_Task3.pth on pre-save features ./embeddings/CS/gpt_task3_prompt2.h5 and ./embeddings/CS/ground.h5, run
python dnn.py CS 3 2 12345 --pretrain 1 --test 1 --saved-model ../CheckGPT_presaved_files/saved_experiments/basic/CS_Task3_Prompt2/Best_CS_Task3.pth
For features of small test data with 1000 samples:
python dnn.py CS_1000 3 2 12346 --pretrain 1 --test 1 --saved-model ../CheckGPT_presaved_files/saved_experiments/basic/CS_Task3_Prompt2/Best_CS_Task3.pth
Training on pre-saved features
python dnn.py {DOMAIN} {TASK} {PROMPT} {EXP_ID}
To train a model from scratch on CS Task 3 Prompt 2:
python dnn.py CS 3 2 12347
Ablation Study: use --modelid to use different model (0 for CheckGPT, 1 for RCH, 2 for MLP-Pool, 3 for CNN):
python dnn.py CS 3 2 12347 --modelid 1
python dnn.py CS 3 2 12347 --modelid 2
python dnn.py CS 3 2 12347 --modelid 3
Transfer Learning
python dnn.py {DOMAIN} {TASK} {PROMPT} {EXP_ID} --trans 1 --mdomain --mtask --mprompt --mid
At the beginning, it will also provide cross-validation (testing) result.
For example, to transfer from CS_Task3_Prompt1 to HSS_Task1_Prompt2, run:
python dnn.py HSS 1 2 12347 --trans 1 --mdomain CS --mtask 3 --mprompt 1 --mid 12346
python dnn.py HSS_500 1 2 12347 --trans 1 --mdomain CS_500 --mtask 3 --mprompt 1 --mid 12346
--mid indicates the pre-trained model in previous experiments (e.g., 12346 as we did above).