Home

Awesome

<div align="center">⚡FlashRAG: A Python Toolkit for Efficient RAG Research<div>

<div align="center"> <a href="https://arxiv.org/abs/2405.13576" target="_blank"><img src=https://img.shields.io/badge/arXiv-b5212f.svg?logo=arxiv></a> <a href="https://huggingface.co/datasets/RUC-NLPIR/FlashRAG_datasets/" target="_blank"><img src=https://img.shields.io/badge/%F0%9F%A4%97%20HuggingFace%20Datasets-27b3b4.svg></a> <a href="https://github.com/RUC-NLPIR/FlashRAG/blob/main/LICENSE"><img alt="License" src="https://img.shields.io/badge/LICENSE-MIT-green"></a> <a><img alt="Static Badge" src="https://img.shields.io/badge/made_with-Python-blue"></a> </div> <h4 align="center"> <p> <a href="#wrench-installation">Installation</a> | <a href="#sparkles-features">Features</a> | <a href="#running-quick-start">Quick-Start</a> | <a href="#gear-components"> Components</a> | <a href="#robot-supporting-methods"> Supporting Methods</a> | <a href="#notebook-supporting-datasets"> Supporting Datasets</a> | <a href="#raised_hands-additional-faqs"> FAQs</a> </p> </h4> FlashRAG is a Python toolkit for the reproduction and development of Retrieval Augmented Generation (RAG) research. Our toolkit includes 32 pre-processed benchmark RAG datasets and 15 state-of-the-art RAG algorithms. <p align="center"> <img src="asset/framework.jpg"> </p>

With FlashRAG and provided resources, you can effortlessly reproduce existing SOTA works in the RAG domain or implement your custom RAG processes and components.

<p> <a href="https://trendshift.io/repositories/10454" target="_blank"><img src="https://trendshift.io/api/badge/repositories/10454" alt="RUC-NLPIR%2FFlashRAG | Trendshift" style="width: 250px; height: 55px;" width="250" height="55"/></a> </p>

:sparkles: Features

:mag_right: Roadmap

FlashRAG is still under development and there are many issues and room for improvement. We will continue to update. And we also sincerely welcome contributions on this open-source toolkit.

:page_with_curl: Changelog

[24/09/18] Due to the complexity and limitations of installing Pyserini in certain environments, we have introduced a lightweight BM25s package as an alternative (faster and easier to use). The retriever based on Pyserini will be deprecated in future versions. To use retriever with bm25s, just set bm25_backend to bm25s in config.

[24/09/09] We add support for a new method <u>Adaptive-RAG</u>, which can automatically select the RAG process to execute based on the type of query. See it result in <u>result table</u>.

[24/08/02] We add support for a new method <u>Spring</u>, significantly improve the performance of LLM by adding only a few token embeddings. See it result in <u>result table</u>.

[24/07/17] Due to some unknown issues with HuggingFace, our original dataset link has been invalid. We have updated it. Please check the new link if you encounter any problems.

[24/07/06] We add support for a new method: <u>Trace</u>, which refine text by constructing a knowledge graph. See it <u>results</u> and <u>details</u>.

<details> <summary>Show more</summary>

[24/06/19] We add support for a new method: <u>IRCoT</u>, and update the <u>result table</u>.

[24/06/15] We provide a <u>demo</u> to perform the RAG process using our toolkit.

[24/06/11] We have integrated sentence transformers in the retriever module. Now it's easier to use the retriever without setting pooling methods.

[24/06/05] We have provided detailed document for reproducing existing methods (see how to reproduce, baseline details), and <u>configurations settings</u>.

[24/06/02] We have provided an introduction of FlashRAG for beginners, see <u>an introduction to flashrag</u> (<u>中文版</u> <u>한국어</u>).

[24/05/31] We supported Openai-series models as generator.

</details>

:wrench: Installation

To get started with FlashRAG, you can simply install it with pip:

pip install flashrag[core]

Or you can clone it from Github and install (requires Python 3.9+):

git clone https://github.com/RUC-NLPIR/FlashRAG.git
cd FlashRAG
pip install -e .[core] 

If you want to use sentence-transformers or pyserini, you can install the optional dependencies:

# Install all extra dependencies
pip install flashrag[full]

# Install sentence-transformers
pip install sentence-transformers

# Install pyserini for bm25
pip install pyserini

Due to the incompatibility when installing faiss using pip, it is necessary to use the following conda command for installation.

# CPU-only version
conda install -c pytorch faiss-cpu=1.8.0

# GPU(+CPU) version
conda install -c pytorch -c nvidia faiss-gpu=1.8.0

Note: It is impossible to install the latest version of faiss on certain systems.

From the official Faiss repository (source):

:rocket: Quick Start

Toy Example

For beginners, we provide a <u>an introduction to flashrag</u> (<u>中文版</u> <u>한국어</u>) to help you familiarize yourself with our toolkit. Alternatively, you can directly refer to the code below.

Demo

We provide a toy demo to implement a simple RAG process. You can freely change the corpus and model you want to use. The English demo uses general knowledge as the corpus, e5-base-v2 as the retriever, and Llama3-8B-instruct as generator. The Chinese demo uses data crawled from the official website of Remin University of China as the corpus, bge-large-zh-v1.5 as the retriever, and qwen1.5-14B as the generator. Please fill in the corresponding path in the file.

<div style="display: flex; justify-content: space-around;"> <div style="text-align: center;"> <img src="./asset/demo_en.gif" style="width: 100%;"> </div> </div>

To run the demo:

cd examples/quick_start

# copy the config file here, otherwise, streamlit will complain that file s
cp ../methods/my_config.yaml .

# run english demo
streamlit run demo_en.py

# run chinese demo
streamlit run demo_zh.py

Pipeline

We also provide an example to use our framework for pipeline execution. Run the following code to implement a naive RAG pipeline using provided toy datasets. The default retriever is e5-base-v2 and default generator is Llama3-8B-instruct. You need to fill in the corresponding model path in the following command. If you wish to use other models, please refer to the detailed instructions below.

cd examples/quick_start
python simple_pipeline.py \
    --model_path <Llama-3-8B-instruct-PATH> \
    --retriever_path <E5-PATH>

After the code is completed, you can view the intermediate results of the run and the final evaluation score in the output folder under the corresponding path.

Using the ready-made pipeline

You can use the pipeline class we have already built (as shown in <u>pipelines</u>) to implement the RAG process inside. In this case, you just need to configure the config and load the corresponding pipeline.

Firstly, load the entire process's config, which records various hyperparameters required in the RAG process. You can input yaml files as parameters or directly as variables. The priority of variables as input is higher than that of files.

from flashrag.config import Config

config_dict = {'data_dir': 'dataset/'}
my_config = Config(config_file_path = 'my_config.yaml',
                config_dict = config_dict)

We provide comprehensive guidance on how to set configurations, you can see our <u>configuration guidance</u>. You can also refer to the <u>basic yaml file</u> we provide to set your own parameters.

Next, load the corresponding dataset and initialize the pipeline. The components in the pipeline will be automatically loaded.

from flashrag.utils import get_dataset
from flashrag.pipeline import SequentialPipeline
from flashrag.prompt import PromptTemplate
from flashrag.config import Config

config_dict = {'data_dir': 'dataset/'}
my_config = Config(config_file_path = 'my_config.yaml',
                config_dict = config_dict)
all_split = get_dataset(my_config)
test_data = all_split['test']

pipeline = SequentialPipeline(my_config)

You can specify your own input prompt using PromptTemplete:

prompt_templete = PromptTemplate(
    config, 
    system_prompt = "Answer the question based on the given document. Only give me the answer and do not output any other words.\nThe following are given documents.\n\n{reference}",
    user_prompt = "Question: {question}\nAnswer:"
)
pipeline = SequentialPipeline(my_config, prompt_template=prompt_templete)

Finally, execute pipeline.run to obtain the final result.

output_dataset = pipeline.run(test_data, do_eval=True)

The output_dataset contains the intermediate results and metric scores for each item in the input dataset. Meanwhile, the dataset with intermediate results and the overall evaluation score will also be saved as a file (if save_intermediate_data and save_metric_score are specified).

Build your own pipeline

Sometimes you may need to implement more complex RAG process, and you can build your own pipeline to implement it. You just need to inherit BasicPipeline, initialize the components you need, and complete the run function.

from flashrag.pipeline import BasicPipeline
from flashrag.utils import get_retriever, get_generator

class ToyPipeline(BasicPipeline):
  def __init__(self, config, prompt_templete=None):
    # Load your own components
    pass

  def run(self, dataset, do_eval=True):
    # Complete your own process logic

    # get attribute in dataset using `.`
    input_query = dataset.question
    ...
    # use `update_output` to save intermeidate data
    dataset.update_output("pred",pred_answer_list)
    dataset = self.evaluate(dataset, do_eval=do_eval)
    return dataset

Please first understand the input and output forms of the components you need to use from our <u>documentation</u>.

Just use components

If you already have your own code and only want to use our components to embed the original code, you can refer to the <u>basic introduction of the components</u> to obtain the input and output formats of each component.

:gear: Components

In FlashRAG, we have built a series of common RAG components, including retrievers, generators, refiners, and more. Based on these components, we have assembled several pipelines to implement the RAG workflow, while also providing the flexibility to combine these components in custom arrangements to create your own pipeline.

RAG-Components

<table> <thead> <tr> <th>Type</th> <th>Module</th> <th>Description</th> </tr> </thead> <tbody> <tr> <td rowspan="1">Judger</td> <td>SKR Judger</td> <td>Judging whether to retrieve using <a href="https://aclanthology.org/2023.findings-emnlp.691.pdf">SKR</a> method</td> </tr> <tr> <td rowspan="4">Retriever</td> <td>Dense Retriever</td> <td>Bi-encoder models such as dpr, bge, e5, using faiss for search</td> </tr> <tr> <td>BM25 Retriever</td> <td>Sparse retrieval method based on Lucene</td> </tr> <tr> <td>Bi-Encoder Reranker</td> <td>Calculate matching score using bi-Encoder</td> </tr> <tr> <td>Cross-Encoder Reranker</td> <td>Calculate matching score using cross-encoder</td> </tr> <tr> <td rowspan="5">Refiner</td> <td>Extractive Refiner</td> <td>Refine input by extracting important context</td> </tr> <tr> <td>Abstractive Refiner</td> <td>Refine input through seq2seq model</td> </tr> <tr> <td>LLMLingua Refiner</td> <td><a href="https://aclanthology.org/2023.emnlp-main.825/">LLMLingua-series</a> prompt compressor</td> </tr> <tr> <td>SelectiveContext Refiner</td> <td><a href="https://arxiv.org/abs/2310.06201">Selective-Context</a> prompt compressor</td> </tr> <tr> <td> KG Refiner </td> <td>Use <a hred='https://arxiv.org/abs/2406.11460'>Trace method to construct a knowledge graph</td> <tr> <td rowspan="4">Generator</td> <td>Encoder-Decoder Generator</td> <td>Encoder-Decoder model, supporting <a href="https://arxiv.org/abs/2007.01282">Fusion-in-Decoder (FiD)</a></td> </tr> <tr> <td>Decoder-only Generator</td> <td>Native transformers implementation</td> </tr> <tr> <td>FastChat Generator</td> <td>Accelerate with <a href="https://github.com/lm-sys/FastChat">FastChat</a></td> </tr> <tr> <td>vllm Generator</td> <td>Accelerate with <a href="https://github.com/vllm-project/vllm">vllm</a></td> </tr> </tbody> </table>

Pipelines

Referring to a <u>survey on retrieval-augmented generation</u>, we categorized RAG methods into four types based on their inference paths.

In each category, we have implemented corresponding common pipelines. Some pipelines have corresponding work papers.

<table> <thead> <tr> <th>Type</th> <th>Module</th> <th>Description</th> </tr> </thead> <tbody> <tr> <td rowspan="1">Sequential</td> <td>Sequential Pipeline</td> <td>Linear execution of query, supporting refiner, reranker</td> </tr> <tr> <td rowspan="1">Conditional</td> <td>Conditional Pipeline</td> <td>With a judger module, distinct execution paths for various query types</td> </tr> <tr> <td rowspan="2">Branching</td> <td>REPLUG Pipeline</td> <td>Generate answer by integrating probabilities in multiple generation paths</td> </tr> <td>SuRe Pipeline</td> <td>Ranking and merging generated results based on each document</td> </tr> <tr> <td rowspan="5">Loop</td> <td>Iterative Pipeline</td> <td>Alternating retrieval and generation</td> </tr> <tr> <td>Self-Ask Pipeline</td> <td>Decompose complex problems into subproblems using <a href="https://arxiv.org/abs/2210.03350">self-ask</a> </td> </tr> <tr> <td>Self-RAG Pipeline</td> <td>Adaptive retrieval, critique, and generation</td> </tr> <tr> <td>FLARE Pipeline</td> <td>Dynamic retrieval during the generation process</td> </tr> <tr> <td>IRCoT Pipeline</td> <td>Integrate retrieval process with CoT</td> </tr> </tbody> </table>

:robot: Supporting Methods

We have implemented 15 works with a consistent setting of:

For open-source methods, we implemented their processes using our framework. For methods where the author did not provide source code, we will try our best to follow the methods in the original paper for implementation.

For necessary settings and hyperparameters specific to some methods, we have documented them in the specific settings column. For more details, please consult our <u>reproduce guidance</u> and <u>method details</u>.

It’s important to note that, to ensure consistency, we have utilized a uniform setting. However, this setting may differ from the original setting of the method, leading to variations in results compared to the original outcomes.

MethodTypeNQ (EM)TriviaQA (EM)Hotpotqa (F1)2Wiki (F1)PopQA (F1)WebQA(EM)Specific setting
Naive GenerationSequential22.655.728.433.921.718.8
Standard RAGSequential35.158.935.321.036.715.7
AAR-contriever-kiltSequential30.156.833.419.836.116.1
LongLLMLinguaSequential32.259.237.525.038.717.5Compress Ratio=0.5
RECOMP-abstractiveSequential33.156.437.532.439.920.2
Selective-ContextSequential30.555.634.418.533.517.3Compress Ratio=0.5
TraceSequential30.750.234.015.537.419.9
SpringSequential37.964.642.637.354.827.7Use Llama2-7B-chat with trained embedding table
SuReBranching37.153.233.420.648.124.2Use provided prompt
REPLUGBranching28.957.731.221.127.820.2
SKRConditional33.256.032.423.431.717.0Use infernece-time training data
Adaptive-RAGConditional35.156.639.128.440.416.0
Ret-RobustLoop42.968.235.843.457.233.7Use LLAMA2-13B with trained lora
Self-RAGLoop36.438.229.625.132.721.9Use trained selfrag-llama2-7B
FLARELoop22.555.828.033.920.720.2
Iter-Retgen, ITRGLoop36.860.138.321.637.918.2
IRCoTLoop33.356.941.532.445.620.7

:notebook: Supporting Datasets & Document Corpus

Datasets

We have collected and processed 35 datasets widely used in RAG research, pre-processing them to ensure a consistent format for ease of use. For certain datasets (such as Wiki-asp), we have adapted them to fit the requirements of RAG tasks according to the methods commonly used within the community. All datasets are available at <u>Huggingface datasets</u>.

For each dataset, we save each split as a jsonl file, and each line is a dict as follows:

{
  'id': str,
  'question': str,
  'golden_answers': List[str],
  'metadata': dict
}

Below is the list of datasets along with the corresponding sample sizes:

TaskDataset NameKnowledge Source# Train# Dev# Test
QANQwiki79,1688,7573,610
QATriviaQAwiki & web78,7858,83711,313
QAPopQAwiki//14,267
QASQuADwiki87,59910,570/
QAMSMARCO-QAweb808,731101,093/
QANarrativeQAbooks and story32,7473,46110,557
QAWikiQAwiki20,3602,7336,165
QAWebQuestionsGoogle Freebase3,778/2,032
QAAmbigQAwiki10,0362,002/
QASIQA-33,4101,954/
QACommenseQA-9,7411,221/
QABoolQwiki9,4273,270/
QAPIQA-16,1131,838/
QAFermiwiki8,0001,0001,000
multi-hop QAHotpotQAwiki90,4477,405/
multi-hop QA2WikiMultiHopQAwiki15,00012,576/
multi-hop QAMusiquewiki19,9382,417/
multi-hop QABambooglewiki//125
Long-form QAASQAwiki4,353948/
Long-form QAELI5Reddit272,6341,507/
Open-Domain SummarizationWikiASPwiki300,63637,04637,368
multiple-choiceMMLU-99,8421,53114,042
multiple-choiceTruthfulQAwiki/817/
multiple-choiceHellaSWAGActivityNet39,90510,042/
multiple-choiceARC-3,3708693,548
multiple-choiceOpenBookQA-4,957500500
Fact VerificationFEVERwiki104,96610,444/
Dialog GenerationWOWwiki63,7343,054/
Entity LinkingAIDA CoNll-yagoFreebase & wiki18,3954,784/
Entity LinkingWNEDWiki/8,995/
Slot FillingT-RExDBPedia2,284,1685,000/
Slot FillingZero-shot REwiki147,9093,724/

Document Corpus

Our toolkit supports jsonl format for retrieval document collections, with the following structure:

{"id":"0", "contents": "...."}
{"id":"1", "contents": "..."}

The contents key is essential for building the index. For documents that include both text and title, we recommend setting the value of contents to {title}\n{text}. The corpus file can also contain other keys to record additional characteristics of the documents.

In the academic research, Wikipedia and MS MARCO are the most commonly used retrieval document collections. For Wikipedia, we provide a <u>comprehensive script</u> to process any Wikipedia dump into a clean corpus. Additionally, various processed versions of the Wikipedia corpus are available in many works, and we have listed some reference links.

For MS MARCO, it is already processed upon release and can be directly downloaded from its <u>hosting link</u> on Hugging Face.

:raised_hands: Additional FAQs

:bookmark: License

FlashRAG is licensed under the <u>MIT License</u>.

:star2: Citation

Please kindly cite our paper if helps your research:

@article{FlashRAG,
    author={Jiajie Jin and
            Yutao Zhu and
            Xinyu Yang and
            Chenghao Zhang and
            Zhicheng Dou},
    title={FlashRAG: A Modular Toolkit for Efficient Retrieval-Augmented Generation Research},
    journal={CoRR},
    volume={abs/2405.13576},
    year={2024},
    url={https://arxiv.org/abs/2405.13576},
    eprinttype={arXiv},
    eprint={2405.13576}
}