Home

Awesome

PVAmpliconFinder

Robitaille, A., Brancaccio, R.N., Dutta, S. et al. PVAmpliconFinder: a workflow for the identification of human papillomaviruses from high-throughput amplicon sequencing. BMC Bioinformatics 21, 233 (2020). https://doi.org/10.1186/s12859-020-03573-8

Description

PVampliconFinder is a data analysis workflow designed to rapidly identify and classify known and potentially new papilliomaviridae sequences from amplicon deep-sequencing with degenerated papillomavirus (PV) primers.

PVampliconFinder is based on alignment similarity metrics, but also consider molecular evolution time for an improved identification and taxonomic classification of novel PVs. The final output of the tool includes a list of fully characterized putatively new papillomaviriade sequences, as well as graphical representations of relative abundance of the virome sequence diversity in the tested samples.

Prerequisites

The PVampliconFinder workflow is designed for the analysis of sequencing reads generated from paired-end sequencing of DNA amplified using degenerated primers targeting specifically the L1 sequence of papillomaviruses (Chouhy et al., 2010,Forslund et al., 1999,Forslund et al., 2003).

Installation

Python2.7 or higher and Perl v5.22.1 or higher are required.

The tool has been created under UNIX environment, but installing clang_osx-64, clangxx_osx-64 and gfortran_osx-64 with conda should provide a functional environment on Mac.

Automatic installation

PVAmpliconFinder rely on Bioconda to install the software and associated dependencies

Please install the version of Miniconda corresponding to your python version

Add conda channel

conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge

Install conda packages

conda install -y fastqc multiqc trim-galore vsearch blast raxml cap3 krona libxml2 gcc_linux-64 gxx_linux-64 gfortran_linux-64 perl-padwalker perl-xml-libxml perl-libxml-perl perl-bioperl perl-getopt-long perl-math-round perl-statistics-basic perl-list-moreutils perl-module-build perl-bioperl-run perl-text-csv

Add PaPaRa program to PATH

export PATH="PATH_TO_PVAMPLICONFINDER/program:$PATH"

For 32bits system, PaPaRa available binary file is not functionnal, as specified on the webpage of the tool. You need to install manually PaPara following the instruction, and put the binary file in PVAmpliconFinder/program. Note that the binary file must be named "papara".

Manual installation

The list of tools used by PVAmpliconFinder can be manually downloaded and installed, and corresponding executable must be present in the PATH environment variable.

Please note that PaPaRa binary file must be named "papara".

List of software

Databases

NCBI databases

PVAmpliconFinder need the nt and taxdb NCBI databases to work properly. You can find thoses databases at the following ftp adress : ftp://ftp.ncbi.nlm.nih.gov/blast/db/. Note that the taxonomy file must be correctly located.

It is advised to use the NCBI script update_blastdb.pl to facilitate the installation of the databases. More information here.

Once downloaded and installed, please check that the ~/.ncbirc file is present and point to the correct NCBI nt database location. More information here.

List of other databases

Input

TypeDescription
-dPATH to input fastq directory

tests files can be found here

Parameters

NameExample valueDescription
-spoolsuffix of fastq filename
-oPV_Amplicon_outputPATH to output directory
NameDefault valueDescription
-fNATabular file containing information about the samples - The first line of this file must be "ID primer tissue"
-bntName of the local "nt" blast database
-i98Threshold of percentage of identity used for the de-novo centroid-based clustering
-t2Number of threads

Flags are special parameters without value.

NameDescription
-hDisplay help

Usage

sh PVAmpliconFinder.sh [-h] [-t threads] [-b "nt" database] [-f info_file] [-i identity thershold] -s fastq_files_suffix -d input_dir -o output_dir

Output

TypeDescription
QC reportReport on FastQ file quality, before and after trimming
Diversity by tissuExcel table of taxonomically classified PV species identified in the samples
Table summaryExcel table of reads metics
Table putative Known virusesExcel table of putative known viruses identified in the samples
Table putative New virusesExcel table of putative new viruses identified in the samples
Putative Known virusesFasta files of putative known viruses ssequences identified in the samples
Putative New virusesFasta files of putative new viruses ssequences identified in the samples
KRONA MegablastDirectory of KRONA graphical representations of the unormalized abundance of viruses identified by Megablast in the samples
KRONA BlastNDirectory of KRONA graphical representations of the unormalized abundance of viruses identified by BlastN in the samples
KRONA RaxMLDirectory of KRONA graphical representations of the unormalized abundance of viruses identified by RaxML-EPA in the samples
Log fileFile of the logs

Detailed description of the output

Detailed description of the output

Contributions

NameEmailDescription
Alexis Robitaillealexis.robitaille@orange.frDeveloper to contact for support
Magali Olivierolivierm@iarc.fr
Massimo Tommasinotommasinom@iarc.fr

Versioning

Version 1.0

Authors

License

This project is licensed under GPL-3.0.

Acknowledgments

References

References

FAQ