Home

Awesome

FireBERT

Hardening BERT classifiers against adversarial attack

Gunnar Mein, UC Berkeley MIDS Program (gunnarmein@berkeley.edu)
Kevin Hartman, UC Berkeley MIDS Program (kevin.hartman@berkeley.edu)
Andrew Morris, UC Berkeley MIDS Program (andrew.morris@berkeley.edu)

With many thanks to our advisors: Mike Tamir, Daniel Cer and Mark Butler for their guidance on this research. And to our significant others as the three of us hunkered down over the three month project.

*Note: This repo used to be anonymous while the paper was in blind review.

Paper

Please read our paper: FireBERT 1.0. When citing our work, please include a link to this repository.

Instructions

The best way to run our project is to download the .zip files in release v1.0. Expand the "data.zip", "resources-1.zip" and "resources-2.zip" files into "data" and "resources" folders, respectively.

Major Pre-requisites

Hardware and run-time expectations

Authors used Intel i7-9th generation personal computers with 64 GB of main memory and NVIDIA 2080 (Max-Q and ti) graphics cards, and various GCP instances. Full evaluation runs for pre-made adversarial samples can be done in a small number of hours. Active attack benchmarks with TextFooler are done in hours for MNLI, but might take days for IMDB. Co-tuning for FACT is expected to run for multiple hours.