Home

Awesome

NGSCheckMate

Nextflow pipeline to detect matched BAMs with NGSCheckMate.

CircleCI Docker Hub https://www.singularity-hub.org/static/img/hosted-singularity--hub-%23e32929.svg

Workflow representation

Description

Implementation of NGSCheckMate and its underlying subset calling, distibuted per sample.

Dependencies

  1. Nextflow : for common installation procedures see the IARC-nf repository.
  2. NGSCheckMate (follow instructions, especially setting up $NCM_HOME variable)
  3. samtools
  4. bcftools

Additionally, the graph output option requires R; see details below about this option.

Input

TypeDescription
--inputyour input BAM file(s) (do not forget the quotes e.g. --input "test_*.bam"). Warning : your BAM file(s) must be indexed, and the test_*.bai should be in the same folder.
--input_folderFolder with BAM files
--input_fileInput file (comma-separated) with 3 columns: ID (individual ID), suffix (suffix for sample names; e.g. RNA), and bam (path to bam file).

A nextflow.config is also included, please modify it for suitability outside our pre-configured clusters (see Nexflow configuration).

Note that the input_file format is tab-delimited text file; this file is used both to provide input bam file locations but also for the generation of the graphs. The ID field must be unique to a subject (e.g. both tumor and normal samples from the same individual must have the same individual identifier). The bam field must be unique to a file name. For example, the following is a valid file:

ID suffix bam NA06984 _RNA NA06984_T_transcriptome.bam
NA06984 _WGS NA06984_T_genome.bam

Parameters

NameExample valueDescription
--output_folderresultsthe folder that will contain NGSCheckMate folder with all results in text files.
--refref.fastayour reference in FASTA
--bedSNP_GRCh38.bedPanel of SNP bed file from NGSCheckMate

Note that a bed file SNP_GRCh38.bed is provided, which is a liftOver of the files at https://github.com/parklab/NGSCheckMate/tree/master/SNP. To use other references, you can provide your own bedfile.

NameDefault valueDescription
--mem16Memory requested (in GB) for calling and NGSCheckmate run
--cpu4Number of threads for germline calling
--bai_ext.bam.baiExtenstion of bai files

Usage

nextflow run NGSCheckMate-nf/ -r v1.1 -profile singularity --ref ref.fasta --input_folder BAM/

To run the pipeline without singularity just remove "-profile singularity". Alternatively, one can run the pipeline using a docker container (-profile docker) the conda receipe containing all required dependencies (-profile conda).

Output

TypeDescription
vcfsa folder with the vcfs used for the matching
NCM_output/output*.txtNGSCheckmate output files with matches between files (see https://github.com/parklab/NGSCheckMate)
NCM_output/output.pdfhierarchical clustering plot from https://github.com/parklab/NGSCheckMate
NCM_output/NCM_graph_wrongmatch.xgmmlgraph with only the samples without a match (adapted from https://github.com/parklab/NGSCheckMate/blob/master/graph/ngscheckmate2xgmml.R)
NCM_output/NCM_graph.xgmmlgraph with all samples (adapted from https://github.com/parklab/NGSCheckMate/blob/master/graph/ngscheckmate2xgmml.R)

Note that we recommend Cytoscape to visualize the .xgmml graphs.

Usage for Cobalt cluster

nextflow run iarcbioinfo/NGSCheckMate -profile cobalt --input "/data/test_*.bam" --output_dir /data/cohort_output --ref_fasta /ref/Homo_sapiens_assembly38.fasta --bed /home/user/bin/NGSCheckMate/SNP/SNP_GRCh38.bed

FAQ

Why are some files not included although the are in the intput_folder?

be careful that if bai files are missing for some bam files, the bam files will be ignored without the workflow returning an error

What modifications have been done to the original NGSCheckMate code?

We provide a modified version of the graph/ngscheckmate2xgmml.R R script from https://github.com/parklab/NGSCheckMate to output graphs in .xgmml format. The modifications allow to represent all samples, even those that match, and improve a small glitch in the color palette.

Contributions

NameEmailDescription
Nicolas Alcala*AlcalaN@iarc.frDeveloper to contact for support
Maxime ValléeDeveloper