Home

Awesome

Tuxedo-NF

A Nextflow implementation of the Tuxedo Suite of Tools Workflow is based on the 2016 Nature Protocols publication: "Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown"

nextflow

Quick start

Make sure you have all the required dependencies listed in the last section.

Install the Nextflow runtime by running the following command:

$ curl -fsSL get.nextflow.io | bash

When done, you can launch the pipeline execution by entering the command shown below:

$ nextflow run skptic/tuxedo-nf

By default the pipeline is executed against the provided example dataset. Check the Pipeline parameters section below to see how enter your data on the program command line.

All parameters can be specified at the command line or alternatively specified in a parameters config file.

Default parameters can be found in the params_default.config file.

Pipeline parameters

--reads

Example:

$ nextflow run skptic/tuxedo-nf --reads '/home/dataset/*.fastq'

This will handle each fastq file as a seperate sample.

Read pairs of samples can be specified using the glob file pattern. Consider a more complex situation where there are three samples (A, B and C), with A and B being paired reads and C being single ended. The read files could be:

sample_A_1.fastq
sample_A_2.fastq
sample_B_1.fastq
sample_B_2.fastq 
sample_C_1.fastq

The reads may be specified as below:

$ nextflow run skptic/tuxedo-nf --reads '/home/dataset/sample_*_{1,2}.fastq'    

--genome

Example:

$ nextflow run skptic/tuxedo-nf --genome /home/user/my_genome/example.fa

--index

Example:

$ nextflow run skptic/tuxedo-nf --genome /home/user/my_genome_index/example

--pheno

Example:

$ nextflow run skptic/tuxedo-nf --transcriptome /home/user/my_exp/exp-info.txt

--download_genome

Example:

$ nextflow run skptic/tuxedo-nf --download-genome=true 

or equivalently just $ nextflow run skptic/tuxedo-nf --download-genome

--genome_address

Example:

$ nextflow run skptic/tuxedo-nf --genome_address=ftp://hgdownload.cse.ucsc.edu/goldenPath/currentGenomes/Homo_sapiens/bigZips/chromFa.tar.gz

--download_annotation

Example:

$ nextflow run skptic/tuxedo-nf --download_annotation=true

or equivalently just $ nextflow run skptic/tuxedo-nf --download_annotation

--annotation_address

Example:

$ nextflow run skptic/tuxedo-nf --genome_address=ftp://hgdownload.cse.ucsc.edu/goldenPath/currentGenomes/Homo_sapiens/bigZips/chromFa.tar.gz

--run_index

Example:

$ nextflow run skptic/tuxedo-nf --run_index=false

--use_sra

Example:

$ nextflow run skptic/tuxedo-nf --use_sra=true

or equivalently just $ nextflow run skptic/tuxedo-nf --use_sra

--sra_ids

Example:

$ nextflow run skptic/tuxedo-nf --sra_ids=`SRR349706,SRR349707,SRR349708`

--cache

Example:

$ nextflow run skptic/tuxedo-nf --cache=`/your/ncbi_cache_location`

--output

Example:

$ nextflow run skptic/tuxedo-nf --output /home/user/my_results 

Cluster support

Tuxedo-NF execution relies on Nextflow framework which provides an abstraction between the pipeline functional logic and the underlying processing system.

Thus it is possible to execute it on your computer or any cluster resource manager without modifying it.

Currently the following platforms are supported:

By default the pipeline is parallelized by spanning multiple threads in the machine where the script is launched.

To submit the execution to a SGE cluster create a file named nextflow.config, in the directory where the pipeline is going to be launched, with the following content:

process {
  executor='sge'
  queue='<your queue name>'
}

In doing that, tasks will be executed through the qsub SGE command, and so your pipeline will behave like any other SGE job script, with the benefit that Nextflow will automatically and transparently manage the tasks synchronisation, file(s) staging/un-staging, etc.

Alternatively the same declaration can be defined in the file $HOME/.nextflow/config.

To lean more about the avaible settings and the configuration file read the Nextflow documentation.

Dependencies