Awesome
Using the IARC nextflow bioinformatics pipelines course 2018
The aim of this course is to enable participants to use the bioinformatics pipelines developed at IARC using nextflow.
Description of the course
Learning objectives
After completing this course, participants will be able to use the IARC nextflow bioinformatics pipelines and more specifically:
- set pipeline parameters and configuration
- execute, resume and debug pipelines,
- trace and visualise pipeline execution,
- run the pipelines on workstation or on a high performance computing cluster,
- use github and Docker/Singularity containers to run reproducible analyses,
- understand the basis concepts of the nextflow language used to describe the pipelines.
Please note that the development of pipelines will only be very briefly covered in this course.
Agenda and slides
Wednesday 23 May (slides)
09:00-10:00 Introduction to bioinformatics pipelines, nextflow, docker, Github and the IARC organization
10:00-10:30 Practical application: running your first pipeline
10:30-11:00 Break
11:00-11-30 The hidden structure of nextflow: work folder and configuration
11:30-12:30 Practical application: configuring, crashing, resuming and debugging pipelines
Thursday 24 May (slides)
09:00-09:30 Introduction to HPC clusters and running pipelines on a cluster.
09:30-10:30 Practical application: trace and visualise pipeline execution with log files.
10:30-11:00 Break
11:00-11h30 Introduction to the nextflow language: understanding what the pipelines are doing
11:30-12:30 Practical application: advanced usages toward reproducibility (choosing a container, Github releases and branches, modifying a pipeline etc.)
Gitter Chat
A is open for the course. This will allow participants to discuss on their projects but also to ask any question regarding the course.
Laptop setup
Laptops use Ubuntu 16.04.
Nextflow is already installed and in ~/bin
, which is in your PATH
.
Docker is already installed. If you are curious, here is how to install it on Docker website.
If you need a good text editor, Atom is also installed.
Demo commands
nextflow run iarcbioinfo/nf_coverage_demo -with-docker --bam_folder data_test/BAM/BAM_multiple/ --bed data_test/BED/TP53_exon2_11.bed
nextflow run iarcbioinfo/platypus-nf -with-docker --input_folder data_test/BAM/ --ref data_test/REF/17.fasta
nextflow run iarcbioinfo/RNAseq-nf -with-docker --input_folder data_test/BAM/BAM_multiple/ --output_folder BAM_realigned --ref_folder data_test/REF --gtf data_test/REF/TP53_small.gtf --bed data_test/BED/TP53_small.bed --mem 4
Config
Config files examples are in the config folder in this repository. Note that adding -with-trace
in your nextflow run
command is equivalent to have a configuration file containing:
trace {
enabled = true
}
or:
trace.enabled = true
One example of each possibility is given (nextflow.config_1
and nextflow.config_2
). You will also find the configuration file I propose to use on IARC Jupiter cluster.
IARC Jupiter cluster
Create a symlink to singularity
ln -s /appli57/singularity/singularity-2.4.5/bin/singularity /home/username/bin/
Add in your ~/.bash_profile
export NXF_JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk.x86_64/jre/
export NXF_TEMP=/data/tmp
export SINGULARITY_CACHEDIR=/data/username/.singularity
export SINGULARITY_LOCALCACHEDIR=/data/tmp/
Change your ~/.nextflow/config
with the one on the config folder in this repository.
Check cluster usage using bhosts
or your own jobs using bjobs
. You can also run our script to check what the others are doing: /appli57/scripts/bjobs_monitor.r
.
Useful links
- IARC bioinformatics GitHub organization
- Docker and DockerHub. See my short docker tutorial here if you want to know more about it. IARC bioinformatics DockerHub page.
- Singularity and the PLOS one paper presenting it.
- Nextflow ressources:
- Nextflow website
- Nextflow documentation
- Nextflow releases on GitHub with changelogs
- Nextflow issues on GitHub
- Nextflow
- Nextflow blog
- Nextflow google group
- Nextflow twitter
- A curated list of Nextflow pipelines
- nf-core: an emerging effort to collect high quality pipelines
- Nextflow paper: Di Tommaso P, Chatzou M, Floden EW, Barja PP, Palumbo E, Notredame C. Nextflow enables reproducible computational workflows. Nat Biotechnol. 2017 Apr 11;35(4):316-319. doi: 10.1038/nbt.3820. PubMed PMID: 28398311.
- Another paper by nextflow folks about the impact of Docker on performance: https://peerj.com/articles/1273/
- Dataflow programming on wikipedia
- Scientific workflow system on wikipedia
- A paper in PLOS Comp. Bio. about using GitHub efficiently to manage your bioinformatics projects