Home

Awesome

pyterrier-deepct

Advanced PyTerrier bindings for DeepCT.

Installation

pip install --upgrade git+https://github.com/terrierteam/pyterrier_deepct.git

Usage

from pyterrier_deepct import DeepCT, Toks2Text
deepct = DeepCT() # loads macavaney/deepct, a version of the model weights converted to huggingface format by default
indexer = deepct >> Toks2Text() >> pt.IterDictIndexer("./deepct_index_path")
indexer.index(dataset.get_corpus_iter())

Options:

Usage (legacy API)

The old API uses the deepct repository, which requires version 1 of tensorflow (not available everywhere, e.g., Colab).

Given an existing DeepCT checkpoint and original Google BERT files, an DeepCT transformer can be created as follows:

from pyterrier_deepct import DeepCTTransformer
deepct = pyterrier_deepct.DeepCTTransformer("bert-base-uncased/bert_config.json", "marco/model.ckpt-65816")
indexer = deepct >> pt.IterDictIndexer("./deepct_index_path")
indexer.index(dataset.get_corpus_iter())

Demos

References

Credits