Home

Awesome

BQSR-nf

Nextflow pipeline for base quality score recalibration with GATK processing

CircleCI Docker Hub https://www.singularity-hub.org/static/img/hosted-singularity--hub-%23e32929.svg

Workflow representation

Decription

Nextflow pipeline for base quality score recalibration and quality control of bam files using GATK

Dependencies

  1. Nextflow: for common installation procedures see the IARC-nf repository.

  2. multiQC

  3. GATK4 must be in the PATH variable

  4. GATK bundle VCF files with lists of indels and SNVs (recommended: 1000 genomes indels, dbsnp VCF)

You can provide a config file to customize the multiqc report (see https://multiqc.info/docs/#configuring-multiqc).

Input

TypeDescription
--input_foldera folder with bam files

Parameters

NameExample valueDescription
--refref.fareference genome fasta file for GATK
NameDefault valueDescription
--cpu2number of CPUs
--mem32memory for mapping
--output_folder.output folder for aligned BAMs
--snp_vcfdbsnp.vcfVCF file with known variants for GATK BQSR
--indel_vcfMills_100G_indels.vcfVCF file with known indels for GATK BQSR
--multiqc_confignullconfig yaml file for multiqc
NameDescription
--helpprint usage and optional parameters

Usage

To run the pipeline on a series of bam files in folder bam, a reference genome with indexes at ref.fa, and known snps and indels from the gatk bundle, one can type:

nextflow run iarcbioinfo/BQSR-nf --input_folder bam --ref ref.fa --snp_vcf GATK_bundle/dbsnp_146.hg38.vcf.gz --indel_vcf GATK_bundle/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz

Output

TypeDescription
BAM/file.bamBAM files of alignments or realignments
BAM/file.bam.baiBAI files of alignments or realignments
QC/multiqc_BQSR_report.htmlmultiqc report
QC/multiqc_BQSR_report_datafolder with data used to compute multiqc report
QC/BAM/BQSR/file_recal.tabletable of scores before recalibration
QC/BAM/BQSR/file_BQSRecalibrated_recal.tabletable of scores after recalibration
QC/BAM/BQSR/file_recalibration_plots.pdfbefore/after recalibration plots

The output_folder directory contains two subfolders: BAM and QC

Directed Acyclic Graph

DAG BQSR

Contributions

NameEmailDescription
Nicolas Alcala*AlcalaN@fellows.iarc.frDeveloper to contact for support