Awesome
BQSR-nf
Nextflow pipeline for base quality score recalibration with GATK processing
Decription
Nextflow pipeline for base quality score recalibration and quality control of bam files using GATK
Dependencies
-
Nextflow: for common installation procedures see the IARC-nf repository.
-
GATK4 must be in the PATH variable
-
GATK bundle VCF files with lists of indels and SNVs (recommended: 1000 genomes indels, dbsnp VCF)
You can provide a config file to customize the multiqc report (see https://multiqc.info/docs/#configuring-multiqc).
Input
Type | Description |
---|---|
--input_folder | a folder with bam files |
Parameters
-
Mandatory
Name | Example value | Description |
---|---|---|
--ref | ref.fa | reference genome fasta file for GATK |
-
Optional
Name | Default value | Description |
---|---|---|
--cpu | 2 | number of CPUs |
--mem | 32 | memory for mapping |
--output_folder | . | output folder for aligned BAMs |
--snp_vcf | dbsnp.vcf | VCF file with known variants for GATK BQSR |
--indel_vcf | Mills_100G_indels.vcf | VCF file with known indels for GATK BQSR |
--multiqc_config | null | config yaml file for multiqc |
-
Flags
Name | Description |
---|---|
--help | print usage and optional parameters |
Usage
To run the pipeline on a series of bam files in folder bam, a reference genome with indexes at ref.fa, and known snps and indels from the gatk bundle, one can type:
nextflow run iarcbioinfo/BQSR-nf --input_folder bam --ref ref.fa --snp_vcf GATK_bundle/dbsnp_146.hg38.vcf.gz --indel_vcf GATK_bundle/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz
Output
Type | Description |
---|---|
BAM/file.bam | BAM files of alignments or realignments |
BAM/file.bam.bai | BAI files of alignments or realignments |
QC/multiqc_BQSR_report.html | multiqc report |
QC/multiqc_BQSR_report_data | folder with data used to compute multiqc report |
QC/BAM/BQSR/file_recal.table | table of scores before recalibration |
QC/BAM/BQSR/file_BQSRecalibrated_recal.table | table of scores after recalibration |
QC/BAM/BQSR/file_recalibration_plots.pdf | before/after recalibration plots |
The output_folder directory contains two subfolders: BAM and QC
Directed Acyclic Graph
Contributions
Name | Description | |
---|---|---|
Nicolas Alcala* | AlcalaN@fellows.iarc.fr | Developer to contact for support |