Home

Awesome

GAMBIT (Genomic Approximation Method for Bacterial Identification and Tracking): A methodology to rapidly leverage whole genome sequencing of bacterial isolates for clinical identification

A Snakemake workflow to (re)produce figures and data in the initial GAMBIT publication:

Lumpe J, Gumbleton L, Gorzalski A, Libuit K, Varghese V, et al. (2023) GAMBIT (Genomic Approximation Method for Bacterial Identification and Tracking): A methodology to rapidly leverage whole genome sequencing of bacterial isolates for clinical identification. PLOS ONE 18(2): e0277575. https://doi.org/10.1371/journal.pone.0277575

Source code for GAMBIT itself is located here.

Please feel free to contact me at jared@jaredlumpe.com with any questions you have.

Instructions

After installing and activating the conda environment (see the Setup section below), simply run:

snakemake [TARGETS...]

from the project's root directory. TARGETS are one or more rule names or output files. By default the main rule is run, which creates figures 1-6. See the Targets section for a list of options.

Directory structure

Setup

This workflow has been built and tested for Linux only. It may work on Mac (haven't tested) but I believe there are issues preventing it from running on Windows.

Required software

All software dependencies are installed using the conda package manager. If you do not already have it installed, I recommend using the Miniconda installer available here. Make sure the conda command is available in your shell.

Conda environment

Install the conda environment into the env/ subdirectory with:

conda env create -f env.yaml -p env

Before running the workflow you must activate the environment by running conda activate ./env from the project's root directory. This must be done with each new shell session.

Install GAMBIT

The preferred way to install GAMBIT is through the Bioconda channel:

conda install -c bioconda gambit=1.0

Make sure your Conda environment is activated first.

Configuration

Most editable config settings are in config/config.yaml.

Download source data

Large files in resources/ are not present in version control and need to be downloaded separately. You can do this all up front by running the fetch_src_data target, which may make things easier to debug if you run into any connection problems. Otherwise the individual data sets will be downloaded as needed when running the workflow.

Targets

This is a list of all "endpoint" rules and output files which you may want to run. It does not include rules which generate intermediate data.

Aggregate rules

RuleDescription
allmain and supplemental.
mainGenerate all primary figures (default).
supplementalGenerate all supplemental figures. Note - supplemental figure 1 is VERY slow.
fetch_src_dataDownload all source data. Not necessary to invoke manually.

Main results

RuleOutputDescription
fig1results/figures/figure-1.{png,csv}Generate figure 1.
fig2results/figures/figure-2{a,b}.pngGenerate figure 2.
fig3results/figures/figure-3.pngGenerate figure 3.
fig4results/figures/figure-4{a,b}.pngGenerate figure 4.
fig5results/figures/figure-5{a,b}.pngGenerate figure 5.
fig6results/figures/figure-6.pngGenerate figure 6.

Supplemental results

RuleOutputDescription
sfig1results/figures/supplemental-figure-1.pngGenerate supplemental figure 1. Note - VERY slow.
sfig2results/figures/supplemental-figure-2.pngGenerate supplemental figure 2.
stable3results/tables/supplemental-table-3.pngGenerate supplemental table 3.
stable4results/tables/supplemental-table-4.pngGenerate supplemental table 4.

Benchmarks

RuleOutputDescription
benchmark_queryresults/benchmarks/gambit-query/Benchmark GAMBIT taxonomic classification from CLI.

Source data

RuleOutputDescription
fetch_gambit_dbresources/gambit-db/Download GAMBIT reference database files.
resources/genomes/set{1,2}/fasta/Download FASTA files for data set 1 or 2 from NCBI. Invoke by output directory.
resources/genomes/set{3,4}/fasta/Download FASTA files for data set 3 or 4. Invoke by output directory.
fetch_genome_set_5resources/genomes/set5/fasta/Download FASTA files for data set 5.

Development

You can enable "test mode" by adding --config test=1 to the command line options. This loads an alternate set of parameters which greatly reduces the amount of work to be done.