Home

Awesome

strelka2-nf

Strelka v2 pipeline with Nextflow

CircleCI Docker Hub https://www.singularity-hub.org/static/img/hosted-singularity--hub-%23e32929.svg

Workflow representation

Dependencies

  1. Nextflow : for common installation procedures see the IARC-nf repository.
  2. Install Strelka v2.

Input

TypeDescription
--input_folderfolder with bam/cram files
--input_fileTab delimited text file with either two columns called normal and tumor (somatic mode) or one column called bam (germline mode); optionally, a column called sample containing sample names to be used for naming the files can be provided and for genotyping (see genotyping mode below) a column called vcf has to be provided

Note: the file provided to --input_file is where you can define pairs of bam/cram to analyse with strelka in somatic mode. It's a tabular file with 2 columns normal and tumor.

normaltumor
normal1.cramtumor2.cram
normal2.cramtumor2.cram
normal3.cramtumor3.cram

Parameters

NameExample valueDescription
--refhg19.fastagenome reference
NameDefault valueDescription
--modesomaticMode for variant calling; one of somatic, germline, genotyping
--output_folderstrelka_ouptutOutput folder for vcf files
--cpu2number of CPUs
--mem20memory
--strelkapath inside docker and singularity containersStrelka installation dir
--configdefault conf of strelkaUse custom configuration file
--callRegionsnoneRegion bed file
--extcramextension of alignment files (bam or cram)

Flags are special parameters without value.

NameDescription
--helpprint usage and optional parameters
--exomeautomatically set up parameters for exome data
--rnaautomatically set up parameters for rna data (only available for --mode germline)
--AFAdd AF field to VCF (only available for --mode somatic)
--outputCallableRegionsCreate a BED track containing regions which are determined to be callable

Usage

mode somatic

nextflow run iarcbioinfo/strelka2-nf r v1.2a -profile singularity --mode somatic --ref hg38.fa --tn_pairs pairs.txt --input_folder path/to/cram/ --strelka path/to/strelka/

To run the pipeline without singularity just remove "-profile singularity". Alternatively, one can run the pipeline using a docker container (-profile docker) the conda receipe containing all required dependencies (-profile conda).

mode germline

nextflow run iarcbioinfo/strelka2-nf r v1.2a -profile singularity --mode germline --ref hg38.fa --input_folder path/to/cram/ --strelka path/to/strelka/

genotyping

When using the input_file mode, if a vcf column with the path to a VCF file for each sample containing a list of somatic variant is provided, the pipeline will use the --forcedGT option from strelka that genotypes these positions, and compute a bedfile for these positions so only variants from the VCF will be genotyped. Note that genotyping can be performed both in somatic mode (in which case tumor/normal pairs must be provided) and germline mode (in which case a single cram file must be provided).

Output

TypeDescription
VCFs/raw/*.vcf.gzVCF files before filtering
VCFs/withAF/*.vcfVCF files with AF field (optional, requires flag --AF)
VCFs/filtered/*PASS.vcf.gzfinal compressed and indexed VCF files (optionally with flag --AF)
CallableRegions/*.bed.gzcompressed and indexed BED files (optionally with flag --outputCallableRegions)

Final vcf files have companion tabix index files (.tbi). Note that in germline mode, the VCF outputted corresponds to variants only (file variants.vcf.gz from strelka).

Directed Acyclic Graph

DAG

Contributions

NameEmailDescription
Vincent CahaisCahaisV@iarc.frDeveloper
Nicolas AlcalaAlcalaN@iarc.frDeveloper