Home

Awesome

ShigaPass

ShigaPass is a new in silico tool used to predict Shigella serotypes and to differentiate between Shigella, EIEC (Enteroinvasive E. coli), and non Shigella/EIEC using assembled whole genomes.

Dependencies

ShigaPass is a command line tool written in Bash version 4.4.20 and requires Blast+ version 2.12.0 to run.

Installation

1. Clone this repository with the following command line:

git clone https://github.com/imanyass/ShigaPass.git

2. Give the execute permission to the file ShigaPass.sh:

chmod +x ShigaPass.sh

3. Execute ShigaPass with the following command line model:

./ShigaPass.sh  [options]

Usage

Run ShigaPass without option to read the following documentation:

###### This tool is used to predict Shigella serotypes  #####
        Usage : ShigaPass.sh [options]
   
        options :
        -l	List of input file(s) (FASTA) with their path(s) (mandatory)
        -o	Output directory (mandatory)
        -p	Path to databases directory (mandatory)
        -t	Number of threads (optional, default: 2)
        -u	Call the makeblastdb utility for databases initialisation (optional, but required when running the script for the first time)
        -k	Do not remove subdirectories (optional)
       	-v	Display the version and exit
        -h	Display this help and exit
        Example: ShigaPass.sh -l list_of_fasta.txt -o ShigaPass_Results -p ShigaPass/ShigaPass_DataBases -t 4 -u -k
        Please note that the -u option should be used when running the script for the first time and after databases updates

Example

Running ShigaPass

Create a list file containing the paths to the FASTA files then run ShigaPass

ShigaPass.sh -l ShigaPass_test.txt -o ShigaPass_Results -p ShigaPass_DataBases -u -k

Here's an example of ShigaPass summary file

Namerfbrfb_hits,(%)MLSTfliCCRISPRipaHPredicted_SerotypePredicted_FlexSerotypeComments
ERR5888634C279,(48.2%)ST145ShH57(ShH3cplx)A-var2ipaH+SB2
ERR5952732B1-5139,(93.3%)ST245ShH2(ShH2cplx)A-var3,x,16ipaH+SF1-51b
ERR5976293D202,(70.6%)ST152ShH25(ShH1cplx)A-var0,27ipaH+SS
ERR5982186A2100,(61.7%)ST147noneA-var1,12,3,5,11-var1ipaH+SD2

"none" means that no allele/profile is detected (in the ERR5982186 example no fliC allele was detected)

SB: S. boydii; SD: S. dysenteriae; SF: S. flexneri; SS: S. sonnei

Output Files

ExtensionDescription
blastout.txtBlast results in tabular format
allrecords.txtBlast hits that passed the selected thresholds
records.txtThe best blast hit that passed the selected thresholds
hits.txtName and number of hits that passed the selected thresholds (only for k-mers databases: rfb, ipaH and POAC genes)
hitscoverage.txtThis file displays in addition to the name and the number of hits detected present in hits.txt, the total hits number for the identified gene (3rd column) and the percentage of the hits detected (number of hits detected/total number of hits) (4th column)

Notes

The Fasta sequences were assembled using SPAdes version 3.15 (Bankevich et al. Journal of Computational Biology, 2012) with the following options: -k 21,33,55,77 --only-assembler --careful --cov-cutoff auto

You can download the short reads using the following command lines:

wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR588/004/ERR5888634/ERR5888634_1.fastq.gz
wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR588/004/ERR5888634/ERR5888634_2.fastq.gz
wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR595/002/ERR5952732/ERR5952732_1.fastq.gz
wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR595/002/ERR5952732/ERR5952732_2.fastq.gz
wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR597/003/ERR5976293/ERR5976293_1.fastq.gz
wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR597/003/ERR5976293/ERR5976293_2.fastq.gz
wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR598/006/ERR5982186/ERR5982186_1.fastq.gz
wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR598/006/ERR5982186/ERR5982186_2.fastq.gz

All reads were filtered with FqCleanER version 3.0 (https://gitlab.pasteur.fr/GIPhy/fqCleanER) with options -q 15 -l 50