Awesome
Nextflow Implementation of the Dowell Lab ChIP-seq Pipeline
For internal Dowell Lab use.
Usage
Download and Installation
Clone this repository in your home directory:
$ git clone https://github.com/Dowell-Lab/ChIP-Flow.git
Install Nextflow:
$ module load curl/7.49.1 (or set path to curl executable if installed locally)
$ curl -s https://get.nextflow.io | bash
Slurm-Specific Usage Requirements
Primary Run Settings
If you are using Linux, this will install nextflow to your home directory. As such, to run Nextflow, you will need to set the PATH to your home directory. Doing so as the following will set the PATH as a variable so you can still acess other paths (e.g. when you load modules) on your cluster without conflict:
$export PATH=~:$PATH
First and foremost, edit conf/slurm_grch38.config
to ensure the proper paths and email address are set (look for all mentions of COMPLETE_*
). Variable names should hopefully be self-explanatory. Then:
$ nextflow run main.nf -profile slurm_grch38 --workdir '</nextflow/work/temp/>' --outdir '</my/project/>' --email <john.doe@themailplace.com> --sras '</dir/to/sras/*>'
Directory paths for sras/fastqs must be enclosed in quotes. Notice the name of the configuration file specified by '-profile'. It's generally a good idea to keep separate configuration files for samples using different reference genomes, and different organisms. The pipeline runs paired-end by default. To run single-end data, you must add the --singleEnd argument.
If anything went wrong, you don't need to restart the pipeline from scratch. Instead...
$ nextflow run main.nf -profile slurm_grch38 -resume
To see a full list of options and pipeline version, enter:
$ nextflow run main.nf -profile fiji --help
Parallel-fastq-dump Installation
As of verison 0.4, we have implemented a wrapper for fastq-dump for multi-threading in place of fasterq-dump due to memory leak issues. This, however, requires the installation of parallel-fastq-dump to your user home. You can do so by running:
$pip3 install parallel-fastq-dump --user
This will check for the sra-tools requirement, so if you do not want this installed to your user then this dependency must already be loaded to your path (i.e. module load sra/2.9.2).
This has been added as an option and the pipeline will run fastq-dump (single core) by default. To run multi-threading on 8 cores, you must specify --threadfqdump
as a nextflow run argument.
Software Requirements
Python3, RSeQC, preseq, Picard Tools, BEDTools, Samtools, HISAT2, BBMap Suite, MultiQC, SRA Tools, IGV Tools
Running Nextflow Using an sbatch script
The best way to run Nextflow is using an sbatch script using the same command specified above. It's advisable to execute the workflow at least in a screen
session, so you can log out of your cluster and check the progress and any errors in standard output more easily. Nextflow does a great job at keeping logs of every transaction, anyway, should you lose access to the console. The memory requirements do not exceed 8GB, so you do not need to request more RAM than this. SRAs must be downloaded prior to running the pipeline.
Arguments
Required Arguments
Arugment | Usage | Description |
---|---|---|
-profile | <base,fiji> | Configuration profile to use. |
--fastqs | </project/*_{1,2}*.fastq.gz> | Directory pattern for fastq files (gzipped). |
--sras | </project/*.sra> | Directory pattern for sra files. |
--workdir | </project/tmp/> | Nextflow working directory where all intermediate files are saved. |
<EMAIL> | Where to send workflow report email. |
Save Options
Arguments | Usage | Description |
---|---|---|
--outdir | </project/> | Specifies where to save the output from the nextflow run. |
--savefq | Compresses and saves raw fastq reads. | |
--saveTrim | Compresses and saves trimmed fastq reads. | |
--saveAll | Compresses and saves all fastq reads. | |
--skipBAM | Skip saving BAM files (CRAM saves by default). | |
--savebw | Save normalized BigWig files for UCSC genome broswer. | |
--savebg | Saves concatenated pos/neg bedGraph file. | |
--savedup | Save deduplicated/marked duplicate BAM files (using picard, cannot be used with --skippicard). |
Input File Options
Arguments | Usage | Description |
---|---|---|
--singleEnd | Specifies that the input files are not paired reads (default is paired-end). |
Performance Options
Arguments | Usage | Description |
---|---|---|
--threadfqdump | Runs multi-threading for fastq-dump for sra processing. |
QC Options
Arguments | Usage | Description |
---|---|---|
--skipMultiQC | Skip running MultiQC. | |
--skipRSeQC | Skip running RSeQC. | |
--skippreseq | Skip running preseq. | |
--skipFastQC | Skip running FastQC | |
--skippileup | Skip running pileup. | |
--skipAllQC | Skip running all QC (does not include mapstats). | |
--noTrim | Skip trimming and only run mapping. | |
--dedup | Remove sequencing duplicates from BAM files (using picard, cannot be used with --skippicard). |