Home

Awesome

OSIRRC Docker Image for Elastirini

Ze Zhong Wu, Ryan Clancy, and Jimmy Lin

This is the docker image for the Anserini toolkit (v0.5.1), with Elasticsearch indexing, conforming to the OSIRRC jig for the Open-Source IR Replicability Challenge (OSIRRC) at SIGIR 2019.

The search results are the same as anserini-docker, thus we use those results.

Quick Start

The following jig command can be used to index TREC disks 4/5 for robust04:

python run.py prepare --repo osirrc2019/elastirini --tag <tag> --collections robust04=/path/to/disk45=trectext

The following jig command can be used to perform a retrieval run on the collection with the robust04 test collection.

python run.py interact --repo osirrc2019/elastirini --tag <tag>

Where <tag> is valid tag.

After entering the above command, use docker port [container id] to see the port mappings to know which ports to use to access Elasticsearch and Kibana on the host machine.

Retrieval Methods

The Anserini image supports the following retrieval methods:

Expected Results

The following numbers should be able to be re-produced using the scripts provided in the bin directory.

robust04

MAPBM25+RM3+AxQL+RM3+Ax
TREC 2004 Robust Track Topics0.25310.29030.28950.24670.27470.2774

core17

MAPBM25+RM3+AxQL+RM3+Ax
TREC 2017 Common Core Track Topics0.20870.28230.27870.20320.26060.2613

core18

MAPBM25+RM3+AxQL+RM3+Ax
TREC 2018 Common Core Track Topics0.24950.31360.29200.25260.30730.2966

gov2

MAPBM25+RM3+AxQL+RM3+Ax
TREC 2004 Terabyte Track: Topics 701-7500.26890.28440.26650.26810.27080.2666
TREC 2005 Terabyte Track: Topics 751-8000.33900.38200.36640.33030.35590.3646
TREC 2006 Terabyte Track: Topics 801-8500.30800.33770.30690.29960.31540.3084

cw09b

MAPBM25+RM3+AxQL+RM3+Ax
TREC 2010 Web Track: Topics 51-1000.11260.09330.09280.10600.10190.1086
TREC 2011 Web Track: Topics 101-1500.10940.10810.09740.09580.08370.0879
TREC 2012 Web Track: Topics 151-2000.11060.11070.13150.10690.10590.1212

cw12b

MAPBM25+RM3+AxQL+RM3+Ax
TREC 2013 Web Track: Topics 201-2500.04680.04120.04350.03970.03220.0359
TREC 2014 Web Track: Topics 251-3000.02240.02100.01800.02350.02030.0186

Implementation

The following is a quick breakdown of what happens in each of the scripts in this repo.

Dockerfile

The Dockerfile installs dependencies, sets the Java home path, copies scripts to the root dir, exposes ports 9200 and 5601, and sets the working dir to /work.

init

The init script clones Anserini, installs the ELK stack, and configures it.

index

The index script indexes the collection with Elasticsearch.

References

Reviews

Documentation reviewed at commit b44ccd9 (2019-06-24) by Ryan Clancy.