Home

Awesome

IARC bioinformatics pipelines and tools (updated on 29/07/2024)

This page lists all the pipelines and tools developed at IARC (mostly nextflow pipelines which are suffixed with -nf). It includes also some useful ressources like courses or data notes and tips/tricks. Finally at the bottom of the page you will also find explanations on how to use nextflow pipelines.

<ins>Table of Content:</ins>

1. IARC pipelines/tools list

2. Courses and data notes

3. Tips & Tricks

4. Coming soon... (only dev branches yet)

5. Nextflow, Docker and Singularity installation and use

6. Outdated and unmaintained pipelines and tools

<a name="head1"></a>1. IARC pipelines/tools list

<a name="head1a"></a>1a. Raw NGS data processing

NameLatest versionMaintainedDescriptionTools used
alignment-nfv1.3 - March 2021:heavy_check_mark: YesPerforms BAM realignment or fastq alignment, with/without local indel realignment and base quality score recalibrationbwa, samblaster, sambamba, samtools, AdapterRemoval, GATK, k8 javascript execution shell, bwa-postalt.js
BQSR-nfv1.1 - Apr 2020:heavy_check_mark: YesPerforms base quality score recalibration of bam files using GATKsamtools, samblaster, sambamba, GATK
abra-nfv3.0 - Apr 2020:heavy_check_mark: YesRuns ABRA (Assembly Based ReAligner)ABRA, bedtools, bwa, sambamba, samtools
gatk4-DataPreProcessing-nfNov 2018?Performs bwa alignment and pre-processing (mark duplicates and recalibration) following GATK4 best practices - compatible with hg38bwa, picard, GATK4, sambamba, qualimap
PostAlignment-nfAug 2018?Perform post alignment on bam filessamtools, sambamba, bwa-postalt.js
*****************************************************************************************
marathon-wgsJune 2018?Studies intratumor heterogeneity with Canopybwa, platypus, strelka2, vt, annovar, R, Falcon, Canopy
ITH-nfSept 2018?Perform intra-tumoral heterogeneity (ITH) analysisStrelka2 , Platypus, Bcftools, Tabix, Falcon, Canopy

<a name="head1b">1b. RNA Seq

NameLatest versionMaintainedDescriptionTools used
RNAseq-nfv2.4 - Dec 2020:heavy_check_mark: YesPerforms RNAseq mapping, quality control, and reads counting - See also RNAseq_analysis_scripts for post-processingfastqc, RESeQC, multiQC, STAR, htseq, cutadapt, Python version > 2.7, trim_galore, hisat2, GATK, samtools
RNAseq-transcript-nfv2.2 - June 2020:heavy_check_mark: YesPerforms transcript identification and quantification from a series of BAM filesStringTie
RNAseq-fusion-nfv1.1 - Aug 2020:heavy_check_mark: YesPerform fusion-genes discovery from RNAseq data using STAR-FusionSTAR-Fusion
gene-fusions-nfv1 - Oct 2020 - updated Nov 2021:heavy_check_mark: YesPerform fusion-genes discovery from RNAseq data using ArribaArriba
quantiseq-nfv1.1 - July 2020:heavy_check_mark: YesQuantify immune cell content from RNA-seq dataquanTIseq

workflow

<a name="head1c">1c. Single-cell RNA seq

NameLatest versionMaintainedDescriptionTools used
SComatic-nfApril 2024:heavy_check_mark: YesPerforms variant calling from single-cell RNAseq dataSComatic, annovar
numbat-nfApril 2024:heavy_check_mark: YesPerforms variant calling from single-cell RNAseq datanumbat, SigProfilerExtractor

<a name="head1d">1d. QC

NameLatest versionMaintainedDescriptionTools used
NGSCheckMatev1.1a - July 2021:heavy_check_mark: YesRuns NGSCheckMate on BAM files to identify data files from a same indidual (i.e. check N/T pairs)NGSCheckMate
conpair-nfJune 2018?Runs conpair (concordance and contamination estimator)conpair, Python 2.7, numpy 1.7.0 or higher, scipy 0.14.0 or higher, GATK 2.3 or higher
damage-estimator-nfJune 2017?Runs "Damage Estimator"Damage Estimator, samtools, R with GGPLOT2 package
QC3May 2016NoRuns QC on DNA seq data (raw data, aligned data and variant calls - forked from slzhaosamtools
fastqc-nfv1.1 - July 2020:heavy_check_mark: YesRuns fastqc and multiqc on DNA seq data (fastq data)FastQC, Multiqc
qualimap-nfv1.1 - Nov 2019:heavy_check_mark: YesPerforms quality control on bam files (WES, WGS and target alignment data)samtools, Qualimap, Multiqc
mpileup-nfJan 2018?Computes bam coverage with samtools mpileup (bed parallelization)samtools,annovar
bamsurgeon-nfMar 2019?Runs bamsurgeon (tool to add mutations to bam files) with step of variant simulationPython 2.7, bamsurgeon, R software (tested with R version 3.2.3)

<a name="head1e">1e. Variant calling

NameLatest versionMaintainedDescriptionTools used
needlestackv1.1 - May 2019:heavy_check_mark: YesPerforms multi-sample somatic variant callingperl, bedtools, samtools and R software
target-seqAug 2019?Whole pipeline to perform multi-sample somatic variant calling using Needlestack on targeted sequencing dataabra2,QC3 ,needlestack, annovar and R software
strelka2-nfv1.2a - Dec 2020:heavy_check_mark: YesRuns Strelka 2 (germline and somatic variant caller)Strelka2
strelka-nfJun 2017NoRuns Strelka (germline and somatic variant caller)Strelka
mutect-nfv2.3 - July 2021:heavy_check_mark: YesRuns Mutect on tumor-matched normal bam pairsMutect and its dependencies (Java 1.7 and Maven 3.0+), bedtools
gatk4-HaplotypeCaller-nfDec 2019?Runs variant calling in GVCF mode on bam files following GATK best practicesGATK
gatk4-GenotypeGVCFs-nfApr 2019?Runs joint genotyping on gvcf files following GATK best practicesGATK
GVCF_pipeline-nfNov 2016?Performs bam realignment and recalibration + variant calling in GVCF mode following GATK best practicesbwa, samblaster, sambamba, GATK
platypus-nfv1.0 - Apr 2018?Runs Platypus (germline variant caller)Platypus
TCGA_platypus-nfAug 2018?Converts TCGA Platypus vcf in format for annotation with annovarvt,VCFTools
vcf_normalization-nfv1.1 - May 2020:heavy_check_mark: YesDecomposes and normalizes variant calls (vcf files)bcftools,samtools/htslib
TCGA_germline-nfMay 2017?Extract germline variants from TCGA data for annotation with annovar (vcf files)R software
gama_annot-nfAug 2020:heavy_check_mark: YesFilter and annotate batch of vcf files (annovar + strand + context)annovar, R
table_annovar-nfv1.1.1 - Feb 2021:heavy_check_mark: YesAnnotate variants with annovar (vcf files)annovar
RF-mut-fNov 2021:heavy_check_mark: YesRandom forest implementation to filter germline mutations from tumor-only samplesannovar
*****************************************************************************************
MutSigOct 2021:heavy_check_mark: YesPipeline to perform mutational signatures analysis of WGS data using SigProfilerExtractorSigProfilerExtractor
MutSpecv2.0 - May 2017?Suite of tools for analyzing and interpreting mutational signaturesannovar
*****************************************************************************************
purple-nfv1.1 - Nov 2021:heavy_check_mark: YesPipeline to perform copy number calling from tumor/normal or tumor-only sequencing data using PURPLEPURPLE
facets-nfv2.0 - Oct 2020:heavy_check_mark: YesPerforms fraction and copy number estimate from tumor/normal sequencing data using facetsfacets , R
CODEX-nfMar 2017?Performs copy number variant calling from whole exome sequencing data using CODEXR with package Codex, Rscript
svaba-nfv1.0 - August 2020:heavy_check_mark: YesPerforms structural variant calling using SvABASvABA , R
sv_somatic_cns-nfv1.0 - Nov 2021:heavy_check_mark: YesPipeline using multiple SV callers for consensus structural variant calling from tumor/normal sequencing dataDelly, SvABA, Manta, SURVIVOR, bcftools, Samtools
ssvhtv1 - Oct 2022:heavy_check_mark: Yes🔴 NEW set of scripts to assist the calling of somatic structural variants from short reads using a random forest classifier

<a name="head1f">1f. Deep learning pipelines and tools for digital pathology

<a name="head1f1">1f1. Whole slide images (WSI) pre-processing

NameLatest versionMaintainedDescriptionTools used
WSIPreprocessingDecember 2023:heavy_check_mark: YesPreprocessing pipeline for WSIs (Tiling, color normalization)Python, openslide

<a name="head1f2">1f2. Tumor segmentation with CFlow AD

NameLatest versionMaintainedDescriptionTools used
TumorSegmentationCFlowADDecember 2023:heavy_check_mark: YesTumour segmentation with an anomaly detection modelPython, PyTorch

<a name="head1f3">1f3. Supervised learning on immunohistochemistry slides

NameLatest versionMaintainedDescriptionTools used
PathonetLNENDecember 2023:heavy_check_mark: YesDetection and classification of cells as positive or negative for an immunomarker developed for PHH3 and Ki-67 in lung carcinoma.Python, TensorFlow

<a name="head1f4">1f4. Self-suprevised feature extractor for WSIs

NameLatest versionMaintainedDescriptionTools used
LNENBarlowTwinsDecember 2023:heavy_check_mark: YesExtractions of HE tiles features with Barlow Twins a self-supervised deep learning model.Python, Pytorch

<a name="head1f5">1f5. Additional tools

NameLatest versionMaintainedDescriptionTools used
SpatialPCAForWSIsDecember 2023:heavy_check_mark: YesSpatially aware principal component analysis to obtain a low-dimensional representation of the tiles encoding vectors.R

<a name="head1g">1g. Other tools/pipelines

NameLatest versionMaintainedDescriptionTools used
template-nfMay 2020:heavy_check_mark: YesEmpty template for nextflow pipelinesNA
data_testAug 2020:heavy_check_mark: YesSmall data files to test IARC nextflow pipelinesNA
bam2cram-nfv1.0 - Nov 2020:heavy_check_mark: YesPipeline to convert bam files to cram filessamtools
hla-neo-nfv1.1 - June 2021:heavy_check_mark: YesPipeline to predict neoantigens from WGS of T/N pairsxHLA, VEP, pVACtools
PRSiceNov 2020Pipeline to compute polygenic risk scoresPRSice-2
methylkeyMay 2021:heavy_check_mark: YesPipeline for 450k and 850k array analysis (bisulfite data analysis using Minfi, Methylumi, Comet, Bumphunter and DMRcate packages)R software
wsearch-nfJuly 2022:heavy_check_mark: Yes🔴 NEW pipeline: Microbiome analysis with usearch, vsearch and phyloseq
AmpliconArchitect-nfv1.0 - Oct 2021:heavy_check_mark: YesDiscovers ecDNA in cancer genomes using AmpliconArchitectAmpliconArchitect
addreplacerg-nfJan 2017?Adds and replaces read group tags in BAM filessamtools
bametrics-nfMar 2017?Computes average metrics from reads that overlap a given set of positionsNA
Gviz_multiAlignmentsAug 2017?Generates multiple BAM alignments views using Gviz bioconductor packageGviz
nf_coverage_demov2.3 - July 2020:heavy_check_mark: YesPlots mean coverage over a series of BAM filesbedtools, R software
LiftOver-nfNov 2017?Converts BED/VCF between hg19 and hg38picard
MinION_pipesJan 2020?Analyze MinION sequencing data for the reconstruction of viral genomesGuppy V3.1.5+, Porechop V0.2.4, Nanofilt V2.2.0, Filtlong V0.2.0, SPAdes V3.10.1, CAP3 02/10/15, BLAST V2.9.0+, MUSCLE V3.8.1551, Nanopolish V0.11.0, Minimap2 V2.15, Samtools version 1.9
DraftPolisherJan 2020?Fast polishing of draft sequences (draft genome assembly)MUSCLE, Python3
Imputation-nfv1.1 - July 2021:heavy_check_mark: YesPipeline to perform dataset genotyping imputationLiftOver, Plink, Admixture, Perl, Term::ReadKey, Becftools, Eagle, Minimac4 and samtools
PVAmpliconFinderAug 2020:heavy_check_mark: YesIdentify and classify known and potentially new papilliomaviridae sequences from amplicon deep-sequencing with degenerated papillomavirus primers.Python and Perl + FastQC, MultiQC, Trim Galore, VSEARCH, Blast, RaxML-EPA, PaPaRa, CAP3, KRONA)
integration_analysis_scriptsMar 2020:heavy_check_mark: YesPerforms unsupervised analyses (clustering) from transformed expression data (e.g., log fpkm) and methylation beta valuesR software with iClusterPlus, gplots and lattice R packages
mpileup2readcountsApr 2018?Get the readcounts at a locus by piping samtools mpileup output - forked from gatoravisamtools
Methylation_analysis_scriptsv1.0 - June 2020 - updated Nov 2021:heavy_check_mark: YesPerform Illumina EPIC 850K array pre-processing and QC from idat filesR software
DRMetricsOct 2020:heavy_check_mark: YesEvaluate the quality of projections obtained after using dimensionality reduction techniquesR software
acnviewer-singularityJul 2019?Build a singularity image of aCNViewer (tool for visualization of absolute copy number and copy neutral variations) (Singularity
polysolver-singularityDec 2019?Build a singularity image of Polysolver (tool for HLA typing based on whole exome seq)Singularity
scanMyWorkDirMay 2018?Non-destructive and informative scan of a nextflow work folderNA

<a name="head2">2. Courses and data notes

NameDescriptionTools used
nextflow-course-2018Nextflow courseNA
SBG-CGC_course2018Analyzing TCGA data in SBG-CGCNA
Medical Genomics CourseMedical Genomics course held at the INSA Lyon - updated Fall 2022NA
intro-cancer-genomicsIntroduction to cancer genomicsNA
mesomics_data_noteRepository with code and datasets used in the mesomics data note manuscriptNA

<a name="head3">3.Tips & Tricks

NameLatest versionMaintainedDescriptionTools used
BAM-tricksTips and tricks for BAM filessamtools, freebayes, bedtools, biobambam2, Picard, rbamtools
VCF-tricksTips and tricks for VCF filessamtools,bcftools, vcflib, vcftools, R scripts
R-tricksTips and tricks for RNA
EGA-tricksTips and tricks to use the European Genome-Phenome Archive from the European Bioinformatics InstituteEGA client
GDC-tricksTips and tricks to use the GDC data portalNA
awesomeTCGACurated list of resources to access TCGA dataNA
LSF-TricksTips and tricks for LSF HPC schedulerNA

<a name="head4">4. Coming soon... (only dev branches yet)

NameDescriptionTools used
DPclust-nfMethod for subclonal reconstruction using SNVs and/or CNAs from whole genome or whole exome sequencing datadpclust , R
ITH_pipelineStudy intra-tumoral heterogeneity (ITH) through subclonality reconstructionHATCHet , DeCiFer, ClonEvol
Nextflow_DSL2Repository with modules for nextflow DSL2NA
variantflagMerge and annotate variants from different callers
EPIDRIVER2020Scripts for EPIDRIVER Project

<a name="head5">5. Nextflow, Docker and Singularity installation and use

<a name="head5a">5a. Nextflow

  1. Install java JRE if you don't already have it (7 or higher).

  2. Install nextflow.

    curl -fsSL get.nextflow.io | bash
    

    And move it to a location in your $PATH (/usr/local/bin for example here):

    sudo mv nextflow /usr/local/bin
    

<a name="head5b">5b. Docker

To avoid having to installing all dependencies each time you use a pipeline, you can instead install docker and let nextflow dealing with it. Installing docker is system specific (but quite easy in most cases), follow  docker documentation (docker CE is sufficient). Also follow the post-installation step to manage Docker as a non-root user (here for Linux), otherwise you will need to change the sudo option in nextflow docker config scope as described in the nextflow documentation here.

To run nextflow pipeline with Docker, simply add the -with-docker option in the nextflow run command.

<a name="head5c">5c. Singularity

To avoid having to installing all dependencies each time you use a pipeline, you can also install singularity and let nextflow dealing with it.

See documentation here.

In case you want to use the same singularity container - with the exactly same versions of pipeline and tools - on several data over time you may want to pull the container and archive it somewhere :

singularity pull shub://IARCbioinfo/pipeline-nf:v2.2

where "pipeline-nf" should be replaced by the name of the pipeline you want to use (example: RNAseq-nf) and 2.2 by the version of the pipeline you want to use (example: 2.4) This will create a singularity container file: pipeline-nf_v2.2.sif (example: RNAseq-nf_v2.4.sif) that you can then use by specifying it in the nextflow command (see usage)

=> example:

singularity pull shub://IARCbioinfo/RNAseq-nf:v2.4

<a name="head5d">5d. Usage

nextflow run iarcbioinfo/pipeline_name -r X --input_folder xxx --output_folder xxx -params-file xxx.yml -w /scratch/work

OR USING SINGULARITY

nextflow run iarcbioinfo/pipeline_name -r X -profile singularity --input_folder xxx --output_folder xxx -params-file xxx.yml -w /scratch/work

OR USING SINGULARITY WITH SPECIFIC CONTAINER

nextflow run iarcbioinfo/pipeline_name -r X -with-singularity XXX.sif --input_folder xxx --output_folder xxx -params-file xxx.yml -w /scratch/work

<a name="head5e">5e. Updates

You can update the nextflow sofware and the pipeline itself simply using:

nextflow -self-update
nextflow pull iarcbioinfo/pipeline_name

You can also automatically update the pipeline when you run it by adding the option -latest in the nextflow run command. Doing so you will always run the latest version from Github.

<a name="head5f">5f. Help

nextflow run iarcbioinfo/pipeline_name --help

<a name="head6">6. Outdated and unmaintained pipelines and tools

NameLatest versionMaintainedDescriptionTools used
GATK-Alignment-nfJune 2017NoPerforms bwa alignment and pre-processing (realignment and recalibration) following first version of GATK best practices (less performant than alignment-nf )bwa, picard, GATK