Awesome
sylph - fast and precise species-level metagenomic profiling with ANIs
Introduction
sylph is a program that performs ultrafast (1) ANI querying or (2) metagenomic profiling for metagenomic shotgun samples.
Containment ANI querying: sylph can search a genome, e.g. E. coli, against your sample. If sylph outputs an estimate of 97% ANI, your sample contains an E. coli with 97% ANI to the queried genome.
Metagenomic profiling: sylph can determine the species/taxa in your sample and their abundances, just like Kraken or MetaPhlAn.
<p align="center"><img src="assets/sylph.gif?raw=true"/></p> <p align="center"> <i> Profiling 1 Gbp of mouse gut reads against 85,205 genomes in a few seconds </i> </p>Why sylph?
-
Precise species-level profiling: Our tests show that sylph has less false positives than Kraken and is about as precise and sensitive as marker gene methods (MetaPhlAn, mOTUs).
-
Ultrafast, multithreaded, multi-sample: sylph can be > 50x faster than other methods for multi-sample processing. sylph only takes ~15GB of RAM for profiling against the entire GTDB-R220 database (110k genomes).
-
Accurate (containment) ANI information: Sylph can often give accurate ANI estimates between reference genomes and your metagenome sample down to 0.1x coverage.
-
Customizable databases and pre-built databases: We offer pre-built databases of prokaryotes, viruses, eukaryotes. Custom databases (e.g. using your own MAGs) are easy to build. Taxonomic information can be incorporated downstream for traditional profiling reports.
-
Short or long reads: Sylph was primarily benchmarked against short reads, but sylph was also the most accurate method on Oxford Nanopore's independent benchmarks.
How does sylph work?
sylph uses a k-mer containment method. sylph's novelty lies in using a statistical technique to correct ANI for low coverage genomes , giving accurate results for low abundance genomes. See here for more information on what sylph can and can not do.
Very quick start
Profile metagenome sample against GTDB-R220 (113,104 bacterial/archaeal species representative genomes)
conda install -c bioconda sylph
# download GTDB-R220 pre-built database (~13 GB)
wget http://faust.compbio.cs.cmu.edu/sylph-stuff/gtdb-r220-c200-dbv1.syldb
# multi-sample paired-end profiling (sylph version >= 0.6)
sylph profile gtdb-r220-c200-dbv1.syldb -1 *_1.fastq.gz -2 *_2.fastq.gz -t (threads) > profiling.tsv
# multi-sample single-end profiling
sylph profile gtdb-r220-c200-dbv1.syldb *.fastq -t (threads) > profiling.tsv
Install
Option 1: conda install
conda install -c bioconda sylph
[!WARNING] conda install may break if AVX2 instructions are not available on your CPU. See the issue here. The binary and source install still work.
Option 2: Build from source
Requirements:
- rust (version > 1.63) programming language and associated tools such as cargo are required and assumed to be in PATH.
- A c compiler (e.g. GCC)
- make
- cmake
Building takes a few minutes (depending on # of cores).
git clone https://github.com/bluenote-1577/sylph
cd sylph
# If default rust install directory is ~/.cargo
cargo install --path . --root ~/.cargo
sylph profile test_files/*
Option 3: Pre-built x86-64 linux statically compiled executable
If you're on an x86-64 system, you can download the binary and use it without any installation.
wget https://github.com/bluenote-1577/sylph/releases/download/latest/sylph
chmod +x sylph
./sylph -h
Note: the binary is compiled with a different set of libraries (musl instead of glibc), probably impacting performance.
Tutorials, manuals, and pre-built databases
Pre-built databases
The pre-built databases available here can be downloaded and used with sylph for profiling and containment querying.
Cookbook
For common use cases and fast explanations, see the above cookbook.
Tutorials
-
Introduction: 5-minute sylph tutorial outlining basic usage
-
Taxonomic profiling against GTDB database with MetaPhlAn-like output format
Manuals
-
Output format (TSV) and containment ANI explanation
-
Incoporating custom taxonomies to get CAMI-like or MetaPhlAn-like outputs
sylph-utils
For incorporating taxonomy and manipulating output formats, see the sylph-utils repository.
Changelog
Version v0.7.0 - 2024-11-06.
- Added the
inspect
option to inspect.syldb/.sylsp
files.
See the CHANGELOG for complete details.
Citing sylph
Jim Shaw and Yun William Yu. Rapid species-level metagenome profiling and containment estimation with sylph (2024). Nature Biotechnology.