Home

Awesome

damage-estimator-nf

Nextflow pipeline to run "Damage Estimator"

Description

This tool estimate the DNA damage when the DNA is sequenced using Illumina plateform on paired-end mode. There are 3 steps (starting from an aligned bam file) :

Cf. https://github.com/Ettwiller/Damage-estimator

Dependencies

  1. This pipeline is based on nextflow. As we have several nextflow pipelines, we have centralized the common information in the IARC-nf repository. Please read it carefully as it contains essential information for the installation, basic usage and configuration of nextflow and our pipelines.

  2. External software:

The tool writes in tmp folder so check that yours is specified in your .bash_profile (export TMPDIR=/data/tmp/, export TMP=/data/tmp)

You can avoid installing all the external software by only installing Docker. See the IARC-nf repository for more information.

Input

TypeDescription
bam folderFolder containing the bam files on which you want to run "Damage Estimator"

Parameters

NameExample valueDescription
--bam_folderPATH/FOLDERfolder containing .bam and .bam.bai files on which to run "Damage Estimator" (bams should preferably be generated by bwa mapping of Illumina paired-end sequencing)
--de_pathPATH/DElocation of folder containing damage estimator files (.pl and .r)
--refPATH/FILEgenome of reference (fasta file)
NameDefault valueDescription
--Q0Phred score quality threshold (Sanger encoding). Only keep the bases with a Q score above a given threshold
--mq10mapping quality. Only keep the reads that passes a given threshold
--max_coverage_limit100If a position has equal or more than MAX reads (R1 or R2), the position is not used to calculate the damage. This option is put in place in order to avoid high coverage regions of the genome being the main driver for the damage estimation program.
--min_coverage_limit1If a position has equal or less than MIN reads (R1 or R2), the position is not used to calculate the damage. This option is put in place in order to calculate damage only in on-target regions (in cases of enrichment protocol such as exome ....)
--qualityscore30Discard the match or mismatch if the base on a read has less than MIN base quality. Important parameters. The lower this limit is, the less the damage is apparent.

For exome bams, we recommend: --Q 20 --mq 20 --max_coverage_limit 300 --min_coverage_limit 30

Usage

nextflow run iarcbioinfo/damage-estimator.nf --bam_folder BAM/ --de_path /path/ --genome_ref ref.fasta

Output

TypeDescription
"SMR" file1 and file2Intermediate mpileup files generated by samtools ("Split Mapped Reads") containing all the positions in the genome with at least one read. The file in -mpileup1 correspond to the first in paired reads and the file in -mpileup2 correspond to the second in paired reads.
Table6 columns : [1] raw count of variant type [2] variant type (ex. G_T, G to T) [3] id (from the --id option) [4] frequency of variant [5] family (the variant type and reverse complement) [6] GIV-score .
GraphRepresentation of the table generated by plot_damage.R

Contributions

NameEmailDescription
VOEGELE Catherinevoegelec@iarc.frDeveloper