Home

Awesome

This repo contains the instructions on how to reproduce results in the paper.

Updates on March 10th, 2022.

People are requesting data pre-processing code for MATRES and TBD pickle files. The original pickles files were produced by the internal NLP annotation tools used by the Information Sciences Institute at USC. Due to contract restriction, we are not able to make those tools public. However, we made effort to replicate those files using public tools: https://github.com/rujunhan/TEDataProcessing. There are some unavoidable but minor differences between the two versions.

0. Environment

Some key packages.

1. Continual Pretraining

For replication purpose, we provide several checkpoints for our pretrained models here: https://drive.google.com/drive/folders/1otj3NjBfra9bzPNWGXOvMsB7SQL4WSDr?usp=sharing

Under the pretrained_models/ folder,

If you are interested in re-training ECONET (and its variants), here are the instructions.

1.1 Pretraining Data

We also released our pre-processed pretraining data using the same Google Drive link above.

Under the data/ folder,

1.2 Generator + Discriminator (ECONET)

Save all of the above data objects in the local .data/ folder.

1.3 Generator Only

1.4 Random Masks

2. Replicating Fine-tuning Results

2.1 TORQUE.

2.2 TBD / MATRES / RED

2.3 MCTACO

3. Re-run Fine-tuning

If you are interested in re-running our fine-tuing, we also provide instructions here.

3.1 TORQUE.

Fine-tuning with ECONET: run bash ./code/finetune_torque.sh A couple parameters you will have to set,

Fine-tuning with RoBERTa-large or BERT-large: run bash ./code/finetune_torque.sh with parameters,

3.2 TBD / MATRES / RED

Fine-tuning with ECONET: run bash ./code/finetune_ere_data.sh A couple parameters you will have to set,

Fine-tuning with RoBERTa-large or BERT-large: run bash ./code/finetune_ere_data.sh with parameters,

3.3 MCTACO

4. TacoLM Baseline