Awesome
ConDA-gen-text-detection
Code for the paper: ConDA: Contrastive Domain Adaptation for AI-generated Text Detection accepted at IJCNLP-AACL 2023 paper link.
:star2: Great News! [Nov 4, 2023] :star2: Our paper won the Outstanding Paper Award at IJCNLP-AACL 2023 held in Bali, Indonesia.
Setup
Set up a separate environment and install requirements via pip install -r requirements.txt
Make directories for the models, output logs and huggingface model files.
mkdir models huggingface_repos output_logs
Download roberta-base
from here and/or roberta-large
from here and place these repositories in huggingface_repos
.
contrast_training_with_da.py
is the ConDA training script. The multi_domain_runner.py
is the runner script for training ConDA models. Update the arguments in multi_domain_runner.py
to train models as needed.
Use the evaluation.py
script for evaluating models. Change arguments within the evaluation.py
script as needed.
TuringBench
Link to the dataset website: link Link to the TuringBench paper: link
Files should be split into 3 jsonl splits: train, valid, test. Each line in the jsonl is a data instance with text
and label
fields.
Links to best performing models for each target generator
Here we provide links to pre-trained ConDA models for the best performing models:
Target | Best performing source | Dropbox Link |
---|---|---|
CTRL | GROVER_mega | link |
FAIR_wmt19 | GPT2_xl | link |
GPT2_xl | FAIR_wmt19 | link |
GPT3 | GROVER_mega | link |
GROVER_mega | CTRL | link |
XLM | GROVER_mega | link |
ChatGPT | FAIR_wmt19 | link |
Citation
If you use (part of) this code, please cite our paper as:
@InProceedings{bhattacharjee-EtAl:2023:ijcnlp,
author = {Bhattacharjee, Amrita and Kumarage, Tharindu and Moraffah, Raha and Liu, Huan},
title = {ConDA: Contrastive Domain Adaptation for AI-generated Text Detection},
booktitle = {Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics},
month = {November},
year = {2023},
address = {Nusa Dua, Bali},
publisher = {Association for Computational Linguistics},
pages = {598--610},
url = {https://aclanthology.org/2023.ijcnlp-long.40}
}
Contact
For any questions, comments, and feedback, contact Amrita Bhattacharjee at abhatt43@asu.edu