Home

Awesome

Genotyping Salmonella Typhi

This repository houses the GenoTyphi genotyping scheme for Salmonella Typhi.

It also describes how to call genotypes, AMR and plasmid markers from Typhi whole-genome sequence reads using Mykrobe ('Typhi Mykrobe') and provides links to alternative tools for calling genotypes from reads or assemblies.

GenoTyphi Scheme

The GenoTyphi genotyping scheme divides the Salmonella Typhi population into genotypes, which each represent a monophyletic cluster are defined by a unique single nucleotide variant (SNV) marker. There 4 major lineages, which are further divided into >75 different clades and subclades. The relationships between genotypes is conveyed in the name, so e.g. genotypes 2.2 and 2.3 are sister clades in the phylogeny, and 2.2 has daughter subclades 2.2.1, 2.2.2 and so forth, as illustrated in the figure below.

<img src="figs/GenoTyphiTree.png" width="400">

Scheme specification

The latest scheme specification, mapping marker SNVs to genotypes, is detailed in the file GenoTyphi_specification.csv in this repository. This file also includes the standard clade-level colour codes that we use for consistency across papers.

Scheme development

The initial development of the scheme is described in this paper, "An extended genotyping framework for Salmonella enterica serovar Typhi, the cause of human typhoid", Wong et al, 2016, Nature Communications.

Subsequent updates to the genotyping scheme, including new genotypes and mutations conferring resistance to fluoroquinolones and azithromycin, are summarised in "Five years of GenoTyphi: updates to the global Salmonella Typhi genotyping framework", Dyson & Holt, 2021, Journal of Infectious Diseases and this technical report.

The scheme is now managed by a working group of the Global Typhoid Genomics Consortium, which is actively working to expand the scheme based on new data, and to establish rules for inclusion and naming of new genotypes. If you would like to suggest new genotypes please post an Issue in this repository, or to join the working group see the consortium website.

Citation

Whichever tool you use to access the GenoTyphi scheme, please cite the 2021 GenoTyphi paper.

If you use the scripts in this repository, please also cite the preprint Ingle et al, BioRxiv 2024 and this repository: DOI

Typhi Mykrobe

Overview

To call genotypes from reads, we recommend using 'Typhi Mykrobe'.

The Mykrobe software provides a platform for kmer-based genotyping direct from fastq files. It was originally developed for genotyping TB and Staph. aureus genomes, but we have developed a Mykrobe panel of genotyping probes for Typhi which provides simultaneous typing of:

Drugs/classes for which resistance is typed are: ampicillin, azithromycin, carbapenems, ceftriaxone, ciprofloxacin, chloramphenicol, sulfonamides, trimethoprim, trimethoprim-sulfamethoxazole, tetracycline. The output is presented as an antibiogram, indicating resistant (R), intermediate (I), or susceptible (S) predictions for each drug in each genome.

A full list of AMR/plasmid typing targets is in the file typhimykrobe/AMR_genes_mutations_plasmids.csv

Below you will find instructions for installing and running Mykrobe with the Typhi panel, as well as a Python script for tabulating the results from multiple readsets (input = fastq, single or paired per genome; output = JSON, 1 per genome) into a simple tab-delimited table (input = JSON files, 1 per genome; output = single TSV).

<img src="figs/TyphiMykrobe.png" width="700">

Quick start

Install Mykrobe:

From bioconda:

conda install -c bioconda mykrobe

From source (after downloading mykrobe, from the mykrobe directory):

pip3 install . && mykrobe panels update_metadata && mykrobe panels update_species all

Run Mykrobe on fastq file/s for a given genome:

mykrobe predict --sample aSample \
  --species typhi \
  --format json \
  --out aSample.json \
  --seq aSample_1.fastq.gz aSample_2.fastq.gz

Output is one JSON file per genome

If your input fastq are Oxford Nanopore Technologies (ONT) reads, add the --ont flag to the command.

Tabulate Mykrobe results for one or more genomes:

(requires Python3 + pandas library)

(python script parse_typhi_mykrobe.py is in this repository in the /typhimykrobe directory in this repository)

python parse_typhi_mykrobe.py --jsons *.json --prefix mykrobe_out

Output is a single tab-delimited table, output format is described below.

Detailed instructions

Installing mykrobe

First, install Mykrobe (v0.10.0+) as per the instructions on the Mykrobe github.

Once Mykrobe is installed, you can run the following two commands to ensure you have the most up-to-date panels for genotyping, including the Typhi panel (latest version, v20240407):

mykrobe panels update_metadata
mykrobe panels update_species all

You can check what version of the scheme is currently loaded in your Mykrobe installation via:

mykrobe panels describe

Running Mykrobe

Inputs are fastq files.

Mykrobe can be run on each individual sample using the command below. Replace aSample with the name of your isolate. The command below uses Illumina data.

mykrobe predict --sample aSample --species typhi --format json --out aSample.json --seq aSample_1.fastq.gz aSample_2.fastq.gz

To run on ONT data instead, add the --ont flag to your command.

Further details on options can be found on the Mykrobe wiki: https://github.com/Mykrobe-tools/mykrobe/wiki

Parse Mykrobe output

Code

typhimykrobe/parse_typhi_mykrobe.py

We have provided a custom python3 script, parse_typhi_mykrobe.py, that will take a group of JSON files output by Mykrobe and summarise these into a single, tab-delimited table.

The parser script will only report details of calls for genomes that are identified by Mykrobe as Typhi. Currently, a sample must have >=85% identity to Typhi MLST locus sequences to be called by the parser. This threshold may not be low enough to correctly parse JSON files created by analysing ONT data (however all the Mykrobe calls will still be present in the JSON file).

Note that due to the nested hierarchical nature of the GenoTyphi scheme, we needed to create fake levels within the scheme to facilitate correct calling by Mykrobe. These fake levels are not reported in the output generated by the parser, but they are present in the raw JSON files output by Mykrobe. These can be identified in the JSON output as they are prepended by the word "lineage", and will always have a call of 0 from Mykrobe.

Dependencies

Input

Output

Example command

python parse_typhi_mykrobe.py --jsons *.json --prefix mykrobe_out

Explanation of columns in the output:

Sequences and details of probes are available here.

Example with test data

# download paired end reads from ENA
wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR209/005/ERR2093255/ERR2093255_1.fastq.gz  
wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR209/005/ERR2093255/ERR2093255_2.fastq.gz  

# Run mykrobe
mykrobe predict --sample ERR2093255 --species typhi --format json --out ERR2093255.json --seq ERR2093255_1.fastq.gz ERR2093255_2.fastq.gz  
python parse_typhi_mykrobe.py --jsons ERR2093255.json --prefix mykrobe_out

Expected output

genomespeciesspp_percentgenotypeconfidencelowest support for genotype markerpoorly supported markersmax support for additional markersadditional markersnode supportampicillinazithromycinceftriaxoneciprofloxacinchloramphenicolsulfamethoxazolesulfonamidestrimethoprimtetracyclineIncFIAHI1IncHI1AIncHI1BR27IncHI1_ST6IncYIncX3IncHI2AIncI1IncL_MIncFIB_pHCM2IncFIB_KIncNz66num QRDRparC_S80IparC_S80RparC_E84GparC_E84KgyrA_S83FgyrA_S83YgyrA_D87GgyrA_D87NgyrA_D87VgyrA_D87YgyrB_S464FgyrB_S464YacrB_R717LacrB_R717QmphAermBereAblaTEM-1DblaCTX-M-15AmpC1blaOXA-7blaOXA-134blaSHV-12qnrS1qnrB1qnrD1catA1cmlA1sul1sul2dfrA1dfrA5dfrA7dfrA14dfrA15dfrA17dfrA18tetAtetBtetCtetD
ERR2093255typhi91.4844.3.1.1.P1strong----1 (1; 0/89); 2 (1; 0/99); 3 (1; 0/102); 4 (1; 76/0); 4.3.1 (1; 74/1); 4.3.1.1 (1; 77/0); 4.3.1.1.P1 (1; 80/0)R: blaTEM-1DSR: blaCTX-M-15R: gyrA_S83F;qnrS1R: catA1R: dfrA7;sul1;sul2R: sul1;sul2R: dfrA7S0000100000000100001000000000000110000100101100100000000

Other typing tools

Other tools that can be used to assign GenoTyphi lineages to Typhi genomes