Awesome
De Novo Transcriptome Assembly
Biocore's de novo transcriptome assembly workflow based on Nextflow
Installation
sh INSTALL.sh it will check the presence of Nextflow in your path, the presence of singularity and will download the BioNextflow library and information about the tools used.
You need either Singularity or Docker to launch the pipeline.
Nextflow version
NXF_VER=0.29.0 nextflow run
Running the pipelines
You can run each pipeline by just using this command
NXF_VER=0.29.0 nextflow run NAME OF THE PIPELINE -bg > log.txt
For example
NXF_VER=0.29.0 nextflow run denovo_assembly.nf -bg > log.txt
You can change the parameters by editing the params.config
file or using two -
for replacing a particular pipeline parameter.
NXF_VER=0.29.0 nextflow run denovo_assembly.nf -bg --output ./myoutput > log.txt
Module denovo_assembly
This module allows to perform de novo assembly and to retrieve both predicted transcripts and proteins.
╔╗ ┬┌─┐┌─┐┌─┐┬─┐┌─┐╔═╗╦═╗╔═╗ ╔╦╗┬─┐┌─┐┌┐┌┌─┐┌─┐┬─┐┬┌─┐┌┬┐┌─┐┌┬┐┌─┐ ╔═╗┌─┐┌─┐┌─┐┌┬┐┌┐ ┬ ┬ ┬
╠╩╗││ ││ │ │├┬┘├┤ ║ ╠╦╝║ ╦ ║ ├┬┘├─┤│││└─┐│ ├┬┘│├─┘ │ │ ││││├┤ ╠═╣└─┐└─┐├┤ │││├┴┐│ └┬┘
╚═╝┴└─┘└─┘└─┘┴└─└─┘╚═╝╩╚═╚═╝ ╩ ┴└─┴ ┴┘└┘└─┘└─┘┴└─┴┴ ┴ └─┘┴ ┴└─┘ ╩ ╩└─┘└─┘└─┘┴ ┴└─┘┴─┘┴
====================================================
BIOCORE@CRG Transcriptome Assembly - N F ~ version 0.1
====================================================
pairs : ../test_data/*_{1,2}.fq.gz
email : YOUREMAIL@YOURDOMAIN
minsize (after filtering) : 70
genetic code : Universal
strangeness : RF
output (output folder) : output
minProtSize (minimum protein sized) : 100
Module RABT_assembly
This module allows to perform de reference annotation based transcript (RABT) assembly and to retrieve both predicted transcripts and proteins.
╔╗ ┬┌─┐┌─┐┌─┐┬─┐┌─┐╔═╗╦═╗╔═╗ ╔╦╗┬─┐┌─┐┌┐┌┌─┐┌─┐┬─┐┬┌─┐┌┬┐┌─┐┌┬┐┌─┐ ╔═╗┌─┐┌─┐┌─┐┌┬┐┌┐ ┬ ┬ ┬
╠╩╗││ ││ │ │├┬┘├┤ ║ ╠╦╝║ ╦ ║ ├┬┘├─┤│││└─┐│ ├┬┘│├─┘ │ │ ││││├┤ ╠═╣└─┐└─┐├┤ │││├┴┐│ └┬┘
╚═╝┴└─┘└─┘└─┘┴└─└─┘╚═╝╩╚═╚═╝ ╩ ┴└─┴ ┴┘└┘└─┘└─┘┴└─┴┴ ┴ └─┘┴ ┴└─┘ ╩ ╩└─┘└─┘└─┘┴ ┴└─┘┴─┘┴
====================================================
BIOCORE@CRG Transcriptome Assembly - N F ~ version 0.1
====================================================
pairs : ../test_data2/*_{1,2}.fq.gz
genome : ../anno/GRCh38.p12.genome.fa.g
z
annotation : ../anno/gencode.v30.annotation.gtf
minsize (after filtering) : 40
genetic code : Universal
output (output folder) : output
minProtSize (minimum protein sized) : 100
strandness : RF
maxIntron : 10000
email : YOUREMAIL@YOURDOMAIN
Module annotation
This module allows to annotate predicted proteins and transcripts from one of the two assembly modules described before.
╔╗ ┬┌─┐┌─┐┌─┐┬─┐┌─┐╔═╗╦═╗╔═╗ ╔╦╗┬─┐┌─┐┌┐┌┌─┐┌─┐┬─┐┬┌─┐┌┬┐┌─┐┌┬┐┌─┐ ╔═╗┌─┐┌─┐┌─┐┌┬┐┌┐ ┬ ┬ ┬
╠╩╗││ ││ │ │├┬┘├┤ ║ ╠╦╝║ ╦ ║ ├┬┘├─┤│││└─┐│ ├┬┘│├─┘ │ │ ││││├┤ ╠═╣└─┐└─┐├┤ │││├┴┐│ └┬┘
╚═╝┴└─┘└─┘└─┘┴└─└─┘╚═╝╩╚═╚═╝ ╩ ┴└─┴ ┴┘└┘└─┘└─┘┴└─┴┴ ┴ └─┘┴ ┴└─┘ ╩ ╩└─┘└─┘└─┘┴ ┴└─┘┴─┘┴
====================================================
BIOCORE@CRG Transcriptome Annotation - N F ~ version 0.1
====================================================
peptide sequences : ../assembly/output/Assembly/lon
gest_orfs.pep
cds sequences : ../assembly/output/Assembly/lon
gest_orfs.cds
annotation in gff3 : ../assembly/output/Assembly/longest_orfs.gff3
transcripts : ../assembly/output/Assembly/Trinity.fasta
email : YOUREMAIL@YOURDOMAIN
genetic code : Universal
output (output folder) : output
diamondDB (uniprot or uniRef90) : /nfs/db/uniprot/2018_10/knowledgebase/complete/blast/db/uniprot_sprot.fasta
pfamDB (pfam database path) : /nfs/db/pfam/Pfam31.0/Pfam-A.hmm
minProtSize (minimum protein sized) : 100
batch_diam : 5000
batch_pfam : 2000
Module quantify
This module allows the quantification of predicted genes obtained from one of the two assembly modules described before.
╔╗ ┬┌─┐┌─┐┌─┐┬─┐┌─┐╔═╗╦═╗╔═╗ ╔╦╗┬─┐┌─┐┌┐┌┌─┐┌─┐┬─┐┬┌─┐┌┬┐┌─┐┌┬┐┌─┐ ╔═╗┌─┐┌─┐┌─┐┌┬┐┌┐ ┬ ┬ ┬
╠╩╗││ ││ │ │├┬┘├┤ ║ ╠╦╝║ ╦ ║ ├┬┘├─┤│││└─┐│ ├┬┘│├─┘ │ │ ││││├┤ ╠═╣└─┐└─┐├┤ │││├┴┐│ └┬┘
╚═╝┴└─┘└─┘└─┘┴└─└─┘╚═╝╩╚═╚═╝ ╩ ┴└─┴ ┴┘└┘└─┘└─┘┴└─┴┴ ┴ └─┘┴ ┴└─┘ ╩ ╩└─┘└─┘└─┘┴ ┴└─┘┴─┘┴
====================================================
BIOCORE@CRG Transcriptome Quantification - N F ~ version 0.1
====================================================
pairs : ../test_data/*_{1,2}.fq.gz
transcripts : ../assembly/output/Assembly/Trinity.fasta
transmap : ../assembly/output/Assembly/Trinity.fasta.gene_trans_map
output : output
email : YOUREMAIL@YOURDOMAIN