Awesome
MGTBench
MGTBench provides the reference implementations of different machine-generated text (MGT) detection methods. It is still under continuous development and we will include more detection methods as well as analysis tools in the future.
Supported Methods
Currently, we support the following methods (continuous updating):
- Metric-based methods:
- Model-based methods:
Supported Datasets
- Essay;
- WP;
- Reuters;
Note that our datasets are constructed based on Verma et al., you can download them from Google Drive.
Installation
git clone https://github.com/xinleihe/MGTBench.git;
cd MGTBench;
conda env create -f environment.yml;
conda activate MGTBench;
Usage
To run the benchmark on the Essay dataset:
# Distinguish Human vs. Claude:
python benchmark.py --dataset Essay --detectLLM Claude --method Log-Likelihood
# Text attribution:
python attribution_benchmark.py --dataset Essay
Note that you can also specify your own datasets on dataset_loader.py
.
Authors
The tool is designed and developed by Xinlei He (CISPA), Xinyue Shen (CISPA), Zeyuan Chen (Individual Researcher), Michael Backes (CISPA), and Yang Zhang (CISPA).
Cite
If you use MGTBench for your research, please cite MGTBench: Benchmarking Machine-Generated Text Detection.
bibtex
@article{HSCBZ23,
author = {Xinlei He and Xinyue Shen and Zeyuan Chen and Michael Backes and Yang Zhang},
title = {{MGTBench: Benchmarking Machine-Generated Text Detection}},
journal = {{CoRR abs/2303.14822}},
year = {2023}
}