Home

Awesome

alignment-nf

Nextflow pipeline for BAM realignment or fastq alignment

CircleCI Docker Hub https://www.singularity-hub.org/static/img/hosted-singularity--hub-%23e32929.svg

Workflow representation

Description

Nextflow pipeline to perform BAM realignment or fastq alignment and QC, with/without local indel realignment and base quality score recalibration.

Dependencies

  1. Nextflow : for common installation procedures see the IARC-nf repository.

Basic fastq alignment

  1. bwa2 (default) or bwa
  2. samblaster
  3. sambamba

BAM files realignment

  1. samtools

Adapter sequence trimming

  1. AdapterRemoval

ALT contigs handling

  1. the k8 javascript execution shell (e.g., available in the bwakit archive); must be in the PATH
  2. javascript bwa-postalt.js and the additional fasta reference .alt file from bwakit must be in the same directory as the reference genome file.

QC

  1. Qualimap.
  2. Multiqc.

Base quality score recalibration

  1. GATK4; wrapper 'gatk' must be in the path
  2. GATK bundle VCF files with lists of indels and SNVs (recommended: Mills gold standard indels VCFs, dbsnp VCF), and corresponding tabix indexes (.tbi)

A conda receipe, and docker and singularity containers are available with all the tools needed to run the pipeline (see "Usage")

Input

TypeDescription
--input_foldera folder with fastq files or bam files

Parameters

NameExample valueDescription
--refhg19.fastagenome reference with its index files (.fai, .sa, .bwt, .ann, .amb, .pac, and .dict; in the same directory)
NameDefault valueDescription
--input_filenullInput file (comma-separated) with 4 columns: SM (sample name), RG (read group ID), pair1 (first fastq of the pair), and pair2 (second fastq of the pair).
--output_folder.Output folder for aligned BAMs
--cpu8number of CPUs
--cpu_BQSR2number of CPUs for GATK base quality score recalibration
--mem32memory
--mem_BQSR10memory for GATK base quality score recalibration
--RGPL:ILLUMINAsequencing information for aligned (for bwa)
--fastq_extfastq.gzextension of fastq files
--suffix1_1suffix for second element of read files pair
--suffix2_2suffix for second element of read files pair
--bedbed file with interval list
--snp_vcfdbsnp.vcfpath to SNP VCF from GATK bundle (default : dbsnp.vcf)
--indel_vcfMills_1000G_indels.vcfpath to indel VCF from GATK bundle (default : Mills_1000G_indels.vcf)
--postaltjsbwa-postalt.js"path to postalignment javascript bwa-postalt.js
--feature_filenullPath to feature file for qualimap
--multiqc_confignullconfig yaml file for multiqc
--adapterremoval_optnullCommand line options for AdapterRemoval
--bwa_membwa-mem2 membwa-mem command; use "bwa mem" to switch to regular bwa-mem (both are in the docker and singularity containers)

Flags are special parameters without value.

NameDescription
--helpprint usage and optional parameters
--trimenable adapter sequence trimming
--recalibrationperform quality score recalibration (GATK)
--altenable alternative contig handling (for reference genome hg38)
--bwa_option_MTrigger the -M option in bwa and the corresponding compatibility option in samblaster (marks shorter split hits as secondary)

Usage

To run the pipeline on a series of fastq or BAM files in folder input and a fasta reference file hg19.fasta, one can type:

nextflow run iarcbioinfo/alignment-nf -r v1.3 -profile singularity  --input_folder input/ --ref hg19.fasta --output_folder output

To run the pipeline without singularity just remove "-profile singularity". Alternatively, one can run the pipeline using a docker container (-profile docker) the conda receipe containing all required dependencies (-profile conda).

Use bwa-mem instead of bwa-mem2

To use bwa-mem, one can type:

nextflow run iarcbioinfo/alignment-nf -r v1.3 -profile singularity  --input_folder input/ --ref hg19.fasta --output_folder output --bwa_mem "bwa mem"

Enable adapter trimming

To use the adapter trimming step, you must add the --trim option, as well as satisfy the requirements above mentionned. For example:

nextflow run iarcbioinfo/alignment-nf -r v1.3 -profile singularity  --input_folder input/ --ref hg19.fasta --output_folder output --trim

Enable ALT mode

To use the alternative contigs handling mode, you must provide the path to an ALT aware genome reference (e.g., hg38) AND add the --alt option, as well as satisfy the above-mentionned requirements. For example:

nextflow run iarcbioinfo/alignment-nf -r v1.3 -profile singularity  --input_folder input/ --ref hg19.fasta --output_folder output --postaltjs /user/bin/bwa-0.7.15/bwakit/bwa-postalt.js --alt

Enable base quality score recalibration

To use the base quality score recalibration step, you must provide the path to 2 GATK bundle VCF files with lists of known snps and indels, respectively, AND add the --recalibration option, as well as satisfy the requirements above mentionned. For example:

nextflow run iarcbioinfo/alignment-nf -r v1.3 -profile singularity  --input_folder input/ --ref hg19.fasta --output_folder output --snp_vcf GATKbundle/dbsnp.vcf.gz --indel_vcf GATKbundle/Mills_1000G_indels.vcf.gz --recalibration

Output

TypeDescription
BAM/folder with BAM and BAI files of alignments or realignments
QC/BAM/multiqc_qualimap_flagstat_*report.htmlmultiQC report for qualimap and samtools flagstat (duplicates)
QC/BAM/multiqc_qualimap_flagstat_*report_datadata used for the multiQC report
QC/qualimap/file_BQSRecalibrated.stats.txtqualimap summary file
QC/qualimap/file_BQSRecalibrated/qualimap files
QC/BAM/BQSR/GATK base quality score recalibration outputs (tables and pdf comparing scores before/after recalibration)

Directed Acyclic Graph

DAG

FAQ

Why did Indel realignment disappear from version 1.0?

Indel realignment was removed following new GATK best practices for pre-processing.

Contributions

NameEmailDescription
Nicolas Alcala*AlcalaN@fellows.iarc.frDeveloper to contact for support
Catherine VoegeleVoegeleC@iarc.frTester
Vincent CahaisCahaisV@iarc.frTester
Alexis RobitailleRobitailleA@students.iarc.frTester