Home

Awesome

Code for paper Zemi: Learning Zero-Shot Semi-Parametric Language Models from Multiple Tasks [ACL 2023 Findings]

Prepare Datasets

Instructions on downloading preprocessed datasets and prepraring costum datasets can be found here

Download Checkpoints

Download checkpoints from: https://uofi.box.com/s/wnt6cv7icuir4q3wb2a6viuyklme5dga. Put the checkpoints directories in checkpoints under zemi/output/p3_finetuning

Setup Environment

Set up conda environment with conda env create -f environment.yml. Run accelerate config to config the device.

Quick Start

Scripts for reproducing the main results in Table 1: performing (semi-)parametric multitask prompted training and zero-shot evaluation. Detailed instructions on the configurations can be found here. All scripts should be run under zemi/. SETUP_ENV.sh will be called in the following scripts for setting up env variables. One may modify the variables if not using the exact same folder structure as setup above.

No Aug baseline

Concat baseline

FiD baseline

Zemi

Brief Description of the Source Code

Visualization of the Retrieved Documents

visualization/ contains examples of the retrieved documents for each task. We include the top 50 examples with the highest and lowest BM25 scores in visualization/top50_highest_score_retrieval_instances and visualization/top50_lowest_score_retrieval_instances. We also include the first 50 instances for each dataset without reordering in visualization/first50_retrieval_instances.

Citation

@article{wang2022zemi,
  title={Zemi: Learning Zero-Shot Semi-Parametric Language Models from Multiple Tasks},
  author={Wang, Zhenhailong and Pan, Xiaoman and Yu, Dian and Yu, Dong and Chen, Jianshu and Ji, Heng},
  journal={arXiv preprint arXiv:2210.00185},
  year={2022}
}