Home

Awesome

Code for GenRead: Genrate rather than Retrieve!

Introduction & Setup

Download the Datasets

Zero-shot Setting

Step1: generate background document.

python mainfunc.py 
  --dataset {dataset} 
  --task step1 
  --split test

Step2: infer answer from document.

python mainfunc.py 
  --dataset {dataset} 
  --task step2 
  --split test

Supervised Setting

Method1: use sampling to generate multiple documents.

python mainfunc.py 
  --dataset {dataset} 
  --task step1 
  --split test 
  --num_sequence 10 
  --temperature 0.95

Method2: use clustering to generate diverse documents.

python clusterfunc.py 
  --dataset {dataset} 
  --task step1 
  --split {split} 
  --num_sequence 1 
  --temperature 0.95 
  --clustering

Fusion-in-decoder: train a reader model to infer answer from documents

Citation

@inproceedings{yu2023generate,
  title={Generate rather than retrieve: Large language models are strong context generators},
  author={Yu, Wenhao and Iter, Dan and Wang, Shuohang and Xu, Yichong and Ju, Mingxuan and Sanyal, Soumya and Zhu, Chenguang and Zeng, Michael and Jiang, Meng},
  booktitle={International Conference for Learning Representation (ICLR)},
  year={2023}
}

Please kindly cite our paper if you find this paper and the codes helpful.