Home

Awesome

LLatrieval: LLM-Verified Retrieval for Verifiable Generation

This repository contains the code and data for paper LLatrieval: LLM-Verified Retrieval for Verifiable Generation. This repository also includes code to reproduce the method we propose in our paper.

:new:News

Quick Links

Requirements

  1. We recommend that you use the python virtual environment and then install the dependencies.
    conda create -n lvr python=3.9.7
    
  2. Next, activate the python virtual environment you just created.
    conda activate lvr
    
  3. Finally, before running the code, make sure you have set up the environment and installed the required packages.
    pip install -r requirements.txt
    

Data

We uploaded the data to Hugging Face🤗.

Start by installing 🤗 Datasets:

pip install datasets

Load a dataset

This command will download the raw data to the data/ folder.

python download_data.py

Download corpus

Use the following command to download the BM25_SPHERE_CORPUS.

wget -P faiss_index https://dl.fbaipublicfiles.com/sphere/sphere_sparse_index.tar.gz
tar -xzvf faiss_index/sphere_sparse_index.tar.gz -C faiss_index

Use the following command to download the WIKI_TSV_CORPUS.

wget https://dl.fbaipublicfiles.com/dpr/wikipedia_split/psgs_w100.tsv.gz
gzip -xzvf psgs_w100.tsv.gz

For more info about the Sphere and Wikipedia snapshot corpora, please refer to ALCE.

Code Structure

Reproduce Our Method

NOTE: There must be raw data and a corpus for retrieval before running the following commands. Once you have them, you also need to modify the parameters of the corresponding files in the commands directory.

For ASQA, use the following command

bash commands/asqa_iterative_retrieval.sh

For QAMPARI, use the following command

bash commands/qampari_iterative_retrieval.sh

For ELI5, use the following command

bash commands/eli5_iterative_retrieval.sh

The result will be saved in iter_retrieval_50/.

Citation

@inproceedings{li-etal-2024-llatrieval,
    title = "{LL}atrieval: {LLM}-Verified Retrieval for Verifiable Generation",
    author = "Li, Xiaonan  and
      Zhu, Changtai  and
      Li, Linyang  and
      Yin, Zhangyue  and
      Sun, Tianxiang  and
      Qiu, Xipeng",
    editor = "Duh, Kevin  and
      Gomez, Helena  and
      Bethard, Steven",
    booktitle = "Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)",
    month = jun,
    year = "2024",
    address = "Mexico City, Mexico",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.naacl-long.305",
    pages = "5453--5471",
}