Home

Awesome

OSIRRC Docker Image for OldDog

Docker Build Status DOI

Chris Kamphuis and Arjen de Vries

This is the docker image for the OldDog project (based on work by Mühleisen et al.) conforming to the OSIRRC jig for the Open-Source IR Replicability Challenge (OSIRRC) at SIGIR 2019. This image is available on Docker Hub has been tested with the jig at commit e51fe7a (06/25/2019).

Quick Start

The following jig command can be used to index TREC disks 4/5 for robust04:

python run.py prepare \
  --repo osirrc2019/olddog \
  --tag v1.0.0 \
  --collections robust04=/path/to/disk45=trectext

The following jig command can be used to perform a retrieval run on the collection with the robust04 test collection.

python run.py search \
  --repo osirrc2019/olddog \
  --tag v1.0.0 \
  --output $(pwd)/out \
  --qrels qrels/qrels.robust04.txt \
  --topic topics/topics.robust04.txt \
  --collection robust04 \
  --opts out_file_name="run.bm25.robust04"

The --opts argument can be extended by adding mode='disjunctive' for disjunctive query processing.

The following jig command can be used to start an interactive session:

python run.py interact \
  --repo osirrc2019/olddog \
  --tag v1.0.0 \

Retrieval Methods

The OldDog image supports the following retrieval models:

Expected Results

The following results should be able to be re-produced using the jig search command.

robust04

MAPconjunctive BM25disjunctive BM25
TREC 2004 Robust Track Topics0.17360.2434
P@30conjunctive BM25disjunctive BM25
TREC 2004 Robust Track Topics0.25260.2985

core18

MAPconjunctive BM25disjunctive BM25
TREC 2018 Common Core Track Topics0.18020.2381
P@30conjunctive BM25disjunctive BM25
TREC 2018 Common Core Track Topics0.31670.3313

Implementation

The following is a quick breakdown of what happens in each of the scripts in the repo.

Dockerfile

The Dockerfile installs dependencies (python3, monetdb, etc.), copies scripts to the root dir, and sets the working dir to /work

init

The init script is a bash script (via the #!/bin/bash she-bang) that invokes wget to download an anserini JAR from Maven Central. Then it downloads the OldDog project from github, which then is build using maven.

index

The index Python script (via the #!/usr/bin/python3 she-bang) reads a JSON string (see here) containing at least one collection to index (including the name, path, and format). The collection is indexed using Anserini (Yang et al., 2017) and placed in a directory, with the same name as the collection, in the working dir (i.e., /work/robust04). After the Lucene index has been created, the OldDog software uses this index to creates csv files from it that can be loaded in the monetdb (Boncz, 2002) column store. A monetDB databse is created and the csv-files are loaded into the database. It is possible to index muliple collection using one container. This is followed by removing the Lucene index so commiting the image takes less time. At this point, jig takes a snapshot and the indexed collections are persisted for the search hook.

search

The search script reads a JSON string (see here) containing the collection name (to map back to the index directory from the index hook) and topic path, among other options. The retrieval run is performed and output is placed in /output for the jig to evaluate using trec_eval.

interact

The interact script starts the monetdb deamon. After this deamon has been started it is possible to open mclient to issue SQL queries. The following command can be used to open mclient:

docker exec -it $(docker ps -aql) mclient -d robust04

Notes

References

Reviews