Home

Awesome

Using the IARC nextflow bioinformatics pipelines course 2018

nextflow

The aim of this course is to enable participants to use the bioinformatics pipelines developed at IARC using nextflow.

Description of the course

Learning objectives
After completing this course, participants will be able to use the IARC nextflow bioinformatics pipelines and more specifically:

Please note that the development of pipelines will only be very briefly covered in this course.

Agenda and slides

Wednesday 23 May (slides)
09:00-10:00 Introduction to bioinformatics pipelines, nextflow, docker, Github and the IARC organization
10:00-10:30 Practical application: running your first pipeline
10:30-11:00 Break
11:00-11-30 The hidden structure of nextflow: work folder and configuration
11:30-12:30 Practical application: configuring, crashing, resuming and debugging pipelines

Thursday 24 May (slides)
09:00-09:30 Introduction to HPC clusters and running pipelines on a cluster.
09:30-10:30 Practical application: trace and visualise pipeline execution with log files.
10:30-11:00 Break
11:00-11h30 Introduction to the nextflow language: understanding what the pipelines are doing
11:30-12:30 Practical application: advanced usages toward reproducibility (choosing a container, Github releases and branches, modifying a pipeline etc.)

Gitter Chat

A Join the chat at https://gitter.im/IARCbioinfo/nextflow-course-2018 is open for the course. This will allow participants to discuss on their projects but also to ask any question regarding the course.

Laptop setup

Laptops use Ubuntu 16.04.

Nextflow is already installed and in ~/bin, which is in your PATH.

Docker is already installed. If you are curious, here is how to install it on Docker website.

If you need a good text editor, Atom is also installed.

Demo commands

nextflow run iarcbioinfo/nf_coverage_demo -with-docker --bam_folder data_test/BAM/BAM_multiple/ --bed data_test/BED/TP53_exon2_11.bed
nextflow run iarcbioinfo/platypus-nf -with-docker --input_folder data_test/BAM/ --ref data_test/REF/17.fasta
nextflow run iarcbioinfo/RNAseq-nf -with-docker --input_folder data_test/BAM/BAM_multiple/ --output_folder BAM_realigned --ref_folder data_test/REF --gtf data_test/REF/TP53_small.gtf --bed data_test/BED/TP53_small.bed --mem 4

Config

Config files examples are in the config folder in this repository. Note that adding -with-trace in your nextflow run command is equivalent to have a configuration file containing:

trace {
    enabled = true
}

or:

trace.enabled = true

One example of each possibility is given (nextflow.config_1 and nextflow.config_2). You will also find the configuration file I propose to use on IARC Jupiter cluster.

IARC Jupiter cluster

Create a symlink to singularity

ln -s  /appli57/singularity/singularity-2.4.5/bin/singularity /home/username/bin/

Add in your ~/.bash_profile

export NXF_JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk.x86_64/jre/
export NXF_TEMP=/data/tmp

export SINGULARITY_CACHEDIR=/data/username/.singularity
export SINGULARITY_LOCALCACHEDIR=/data/tmp/

Change your ~/.nextflow/config with the one on the config folder in this repository.

Check cluster usage using bhosts or your own jobs using bjobs. You can also run our script to check what the others are doing: /appli57/scripts/bjobs_monitor.r.

Useful links

Tips and tricks