Home

Awesome

<p align="center"> <a href="http://frogs.toulouse.inrae.fr/"> <img src="FROGS_logo.png" align="center" width="20%" style="display: block; margin: auto;"/> </a> </p>

Visit our web site : http://frogs.toulouse.inrae.fr/

ReleaseDate <img src="https://cdn.icon-icons.com/icons2/70/PNG/512/deezer_14086.png" width="2%" style="display: block; margin: auto;"/>

Description

FROGS is a CLI workflow designed to produce and analysis an ASV count matrix from high depth sequencing amplicon data.

FROGS-wrappers allow to add FROGS on a Galaxy instance. (see https://github.com/geraldinepascal/FROGS-wrappers)

This workflow is focused on:

Table of content

Convenient input data

Legend for the next schemas:

.: Complete nucleic sequence
!: Region of interest
*: PCR primers
        From:                                    To:
         rDNA .........!!!!!!................    ......!!!!!!!!!!!!!!!!!!!.....
         Ampl      ****!!!!!!****                  ****!!!!!!!!!!!!!!!!!!!****
           R1      --------------                  --------------
           R2      --------------                               --------------

In any case, the maximum overlap between R1 and R2 can be the complete overlap.

The minimum authorized overlap between R1 and R2 is 10nt. With less, the overlap can be incorrect, it will be rejected or considered as non overlap reads.

        rDNA .........!!!!!!................
        Ampl      ****!!!!!!****
        Read      --------------

        rDNA .....!!!!!!!!!!!!!!............
        Ampl      ****!!!!!!****
        Read      --------------       

The amplicons can have a high length variability such as ITS. The R1 and R2 can have different length.

Installation

This FROGS repository is for command line user. If you want to install FROGS on Galaxy, please refer to FROGS-wrappers.

Tools dependencies

FROGS is written in Python 3 (with external numpy and Scipy libraries) , uses also home-made scripts written in PERL5 and R 4.

FROGS relies on different specific tools for each of the analysis steps.

FROGS ToolsDependancyversion tested
Denoising and Remove_chimeravsearch2.17.0
Denoisingflash (optional)1.2.11
Denoisingcutadapt (need to be >=2.8)2.10
Denoisingswarm (need to be >=2.1)3.1.4
DenoisingDADA21.22.0
ITSxITSx1.1.2
Taxonomic_affiliationNCBI BLAST+2.10
Taxonomic_affiliationRDP Classifier2.0.3
Taxonomic_affiliationEMBOSS needleall6.6.0
TreeMAFFT7.407
TreeFasttree2.1.9
Tree / FROGSSTATplotly, phangorn, rmarkdown, phyloseq, DESeq2, optparse, calibrate, formattable, DTR 4.1.2
FROGSSTATpandoc2.11.3
FROGSFUNCPICRUSt22.5.1
FROGSFUNCete33.1.1

Use PEAR as read pairs merging software in preprocess

PEAR is one of the most effective software for read pairs merging, but as its license is not free for private use, we can not distribute it in FROGS. If you work in an academic lab on a private Galaxy server, or if you have paid your license you can use PEAR in FROGS preprocess. For that you need to:

FROGS and dependencies installation

From conda

FROGS is now available on bioconda (https://anaconda.org/bioconda/frogs).

conda env create --name frogs@5.0.0 --file frogs-conda-requirements.yaml
# to use FROGS, first you need to activate your environment
conda activate frogs@5.0.0

WARNING : As PICRUSt2 currently relies on a different R version, in order to use the FROGSFUNC tools, it is necessary to create a dedicated conda environment as follows:

conda env create --name frogsfunc@5.0.0 --file frogsfunc-conda-requirements.yaml
# and then activate the environment
conda activate frogsfunc@5.0.0

After that, you just have to switch from one environment to another (with conda activate frogs@5.0.0 or conda activate frogsfunc@5.0.0 depending on whether you want to use FROGSFUNC or all the other tools.

Check intallation

To check your installation you can type:

cd <conda_env_dir>/frogs@5.0.0/share/FROGS-5.0.0/test

conda activate frogs@5.0.0

sh test_frogs.sh <NB_CPU> <JAVA_MEM> <OUT_FOLDER>

"Bioinformatic" tools are performed on a small simulated dataset of one sample replicated three times. "Statistical" tools are performed on an extract of the published results of Chaillou et al, ISME 2014

This test executes the FROGS tools in command line mode. Example:

[user@computer:/home/frogs/FROGS/test/]$ sh test_frogs.sh 1 2 res
Step demultiplex jeu. 02 mai 2024 17:54:17 CEST
Step denoising 16S vsearch jeu. 02 mai 2024 17:54:21 CEST:
Step denoising 16S pear jeu. 02 mai 2024 17:54:37 CEST:
Step denoising: dada2 keep-unmerged jeu. 02 mai 2024 17:55:48 CEST
Step denoising: preprocess only jeu. 02 mai 2024 17:57:10 CEST
Step remove_chimera jeu. 02 mai 2024 17:57:13 CEST
Step cluster_filters jeu. 02 mai 2024 17:57:15 CEST
Step itsx jeu. 02 mai 2024 17:57:19 CEST
Step taxonomic_affiliation jeu. 02 mai 2024 17:57:24 CEST
Step affiliation_filters: masking mode jeu. 02 mai 2024 17:57:32 CEST
Step affiliation_filters: deleted mode jeu. 02 mai 2024 17:57:34 CEST
Step affiliation_postprocess jeu. 02 mai 2024 17:57:35 CEST
Step normalisation jeu. 02 mai 2024 17:57:36 CEST
Step cluster_stats jeu. 02 mai 2024 17:57:41 CEST
Step affiliation_stats jeu. 02 mai 2024 17:57:42 CEST
Step biom_to_tsv jeu. 02 mai 2024 17:57:43 CEST
Step biom_to_stdBiom jeu. 02 mai 2024 17:57:44 CEST
Step tsv_to_biom jeu. 02 mai 2024 17:57:44 CEST
Step tree jeu. 02 mai 2024 17:57:45 CEST
Step phyloseq_import_data jeu. 02 mai 2024 17:58:00 CEST
Step phyloseq_composition jeu. 02 mai 2024 17:59:40 CEST
Step phyloseq_alpha_diversity jeu. 02 mai 2024 18:00:02 CEST
Step phyloseq_beta_diversity jeu. 02 mai 2024 18:00:25 CEST
Step phyloseq_structure jeu. 02 mai 2024 18:00:44 CEST
Step phyloseq_clustering jeu. 02 mai 2024 18:01:02 CEST
Step phyloseq_manova jeu. 02 mai 2024 18:01:17 CEST
Step deseq2_preprocess jeu. 02 mai 2024 18:01:32 CEST
DESeq2 asv abundances
DESeq2 function abundances
Step deseq2_visualisation jeu. 02 mai 2024 18:03:02 CEST
DESeq2 otu abundances
DESeq2 function abundances
Completed with success

Finally, to check the FROGSFUNC tools installation you can type:

cd <conda_env_dir>/frogsfunc@5.0.0/share/FROGS-5.0.0/test

conda activate frogsfunc@5.0.0

sh test_frogsfunc.sh <OUT_FOLDER>

This test executes the FROGSFUNC tools in command line mode. Example:

[user@computer:/home/frogs/FROGS/test/]$ sh test_frogsfunc.sh res
Step frogsfunc_placeseqs jeu. 02 mai 2024 18:32:19 CEST
Step frogsfunc_functions jeu. 02 mai 2024 18:33:49 CEST
Step frogsfunc_pathways jeu. 02 mai 2024 18:36:32 CEST
Completed with success

Memory and parallelisation advices

If you have more than one CPU, it is recommended to increase the number of CPUs used by tools. All the CPUs must be on the same computer/node.

ToolRAM per CPUMinimal RAMConfiguration example
Denoising8Gb10 Gb12 CPUs and 96 GB
ITSx / Remove_Chimera3Gb5Gb12 CPUs and 36 GB
Taxonomic_affiliation-20 Gb30 CPUs and 300 GB

Download databanks

Reference databanks are needed to filter contaminants, assign taxonomy to each ASV or filter ambiguities for hyper variable amplicon length.

We propose some databanks, that you simply need to download and extract.

Please take time to read individual README.txt and LICENCE.txt files.

In addition, several default databases are used in FROGSFUNC steps by PICRUSt2.

Troubleshooting

Abnormal increase memory consumption with CPU number

With some old versions of glibc the virtual memory used by CPU is multiplicative.

Nb CPUsexpected RAM consumtionobserved RAM consumption
11 Gb1Gb
22 Gb2*2 Gb
33 Gb3*3 Gb
44 Gb4*4 Gb

The parameters memory and CPU provided in examples take into account this problem.

License

GNU GPL v3

Copyright

2024 INRAE

Citation

Depending on which type of amplicon you are working on (mergeable or unmergeable), please cite one of the two FROGS publications:

Contact

frogs-support@inrae.fr