Home

Awesome

PyTerrier_t5

This is the PyTerrier plugin for the Mono and Duo T5 ranking approaches [Nogueira21].

Note that this package only supports scoring from a pretrained models (like this one).

Installation

This repostory can be installed using Pip.

pip install --upgrade git+https://github.com/terrierteam/pyterrier_t5.git

Building T5 pipelines

You can use MonoT5 just like any other text-based re-ranker. By default, it uses a MonoT5 model previously trained on MS MARCO passage ranking training queries.

import pyterrier as pt
from pyterrier_t5 import MonoT5ReRanker, DuoT5ReRanker
monoT5 = MonoT5ReRanker() # loads castorini/monot5-base-msmarco by default
duoT5 = DuoT5ReRanker() # loads castorini/duot5-base-msmarco by default

dataset = pt.get_dataset("irds:vaswani")
bm25 = pt.BatchRetrieve(pt.get_dataset("vaswani").get_index(), wmodel="BM25")
mono_pipeline = bm25 >> pt.text.get_text(dataset, "text") >> monoT5
duo_pipeline = mono_pipeline % 5 >> duoT5 # apply a rank cutoff of 5 from monoT5 since duoT5 is too costly to run over the full result list

Note that both approaches require the document text to be included in the dataframe (see pt.text.get_text).

MonoT5ReRanker and DuoT5ReRanker have the following options:

Examples

Checkout out the notebooks, even on Colab:

Implementation Details

We use a PyTerrier transformer to score documents using a T5 model.

Sequences longer than the model's maximum of 512 tokens are silently truncated. Consider splitting long texts into passages and aggregating the results (examples).

References

Credits