Home

Awesome

<div align="center"> <figure class="center-figure"> <img src="media/logo.png" width="85%"></figure> </div> <h1 align="left"> STaRK: Benchmarking LLM Retrieval on Textual and Relational Knowledge Bases </h1> <div align="left">

License: MIT

</div>

NEWS

What is STaRK?

STaRK is a large-scale Semi-structured Retrieval Benchmark on Textual and Relational Knowledge bases, covering applications in product search, academic paper search, and biomedicine inquiries.

Featuring diverse, natural-sounding, and practical queries that require context-specific reasoning, STaRK sets a new standard for assessing real-world retrieval systems driven by LLMs and presents significant challenges for future research.

🔥 Check out our website for more overview!

<!-- <figure class="center-figure"> <img src="media/overview.jpg" width="90%"> </figure> ## Why STaRK? - **Novel Task**: Recently, large language models have demonstrated significant potential on information retrieval tasks. Nevertheless, it remains an open question how effectively LLMs can handle the complex interplay between textual and relational requirements in queries. - **Large-scale and Diverse KBs**: We provide three large-scale knowledge bases across three areas, which are constructed from public sources. <figure class="center-figure"> <img src="media/kb.jpg" width="90%"></figure> - **Natural-sounding and Practical Queries**: The queries in our benchmark are crafted to incorporate rich relational information and complex textual properties, and closely mirror questions in real-life scenarios, e.g., with flexible query formats and possibly with extra contexts. <figure class="center-figure"> <img src="media/questions.jpg" width="95%"></figure> -->

Access benchmark data

1) Env Setup

From pip (recommended)

With python >=3.8 and <3.12

pip install stark-qa

From source

Create a conda env with python >=3.8 and <3.12 and install required packages in requirements.txt.

conda create -n stark python=3.11
conda activate stark
pip install -r requirements.txt

2) Data loading

from stark_qa import load_qa, load_skb

dataset_name = 'amazon'

# Load the retrieval dataset
qa_dataset = load_qa(dataset_name)
idx_split = qa_dataset.get_idx_split()

# Load the semi-structured knowledge base
skb = load_skb(dataset_name, download_processed=True, root=None)

The root argument for load_skb specifies the location to store SKB data. With default value None, the data will be stored in huggingface cache.

Data of the Retrieval Task

Question answer pairs for the retrieval task will be automatically downloaded in data/{dataset}/stark_qa by default. We provided official split in data/{dataset}/split.

Data of the Knowledge Bases

There are two ways to load the knowledge base data:

3) Evaluation on benchmark

If you are running eval, you may install the following packages:

pip install llm2vec gritlm bm25

Reference

Please consider citing our paper if you use our benchmark or code in your work:

@inproceedings{wu24stark,
    title        = {STaRK: Benchmarking LLM Retrieval on Textual and Relational Knowledge Bases},
    author       = {
        Shirley Wu and Shiyu Zhao and 
        Michihiro Yasunaga and Kexin Huang and 
        Kaidi Cao and Qian Huang and 
        Vassilis N. Ioannidis and Karthik Subbian and 
        James Zou and Jure Leskovec
    },
    booktitle    = {NeurIPS Datasets and Benchmarks Track},
    year         = {2024}
}