


Pipeline to datect somatic variants from single-cell sequencing data

Pipeline for detecting somatic single-nucleotide mutations in high-throughput single-cell genomics and transcriptomics data sets, such as single-cell RNA-seq and single-cell ATAC-se (using SComatic) and de novo extraction of mutational signatures (using SigProfilerExtractor).

Pipeline runs in following steps: <br>Step1-4 : SComatic steps <br>annovar : annotated all variants using annovar <br>preprocessing : to create input for SigProfilerExtractor from SComatic output <br>Step5_sigprofiler : de-novo extraction of mutational signatures using SigProfilerExtractor <br>


  1. This pipeline is based on nextflow. As we have several nextflow pipelines, we have centralized the common information in the IARC-nf repository. Please read it carefully as it contains essential information for the installation, basic usage and configuration of nextflow and our pipelines.
  2. SComatic
  3. annovar
  4. SigProfilerExtractor

You can avoid installing all the external software by only installing Docker. See the IARC-nf repository for more information.


--bam_folderFolder containing BAM files (*bai must be available in the same folder).
--metaMetadata file mapping cell barcodes to cell type.


--scomat_path/Users/lipika/SComaticScomatic installation folder path
--refref.fagenome reference files (with index)
--annovar_path/Users/lipika/annovarpath to annovar
--hdb/Users/lipika/humandbpath to human database for annotation (refGene,cytoBand,exac03,avsnp147,dbnsfp30a,gnomad_genome- required)
--hg_buildGRCh38genome build
--cpu2Number of CPUs
--output_folderSComatic-nf-resultsOutput folder
--nTrim5Number of bases trimmed by setting the base quality to 0 at the beginning and end of each read
--maxNM5Maximum number of mismatches permitted to consider reads for analysis
--maxNH1Maximum number of alignment hits permitted to consider reads for analysis
--chromallChromosome to be analysed
--minbq30Minimum base quality permited for the base counts
--nprocs1Number of processes
--pon30Panel of normals (PoN) file to be used to remove germline polymorphisms and recurrent artefacts
--min_signatures1Minimum number of Mutational signatures
--max_signatures10Maximum number of Mutational signatures

Flags are special parameters without value.

--helpDisplay help


annovar database files for hg38 could be downloaded using the command below (example shown for avsnp147)

perl path/to/annovar/annotate_variation.pl -buildver hg38 -downdb -webfrom annovar avsnp147 humandb/

Install reference genome to be used in SigProfilerExtractor using python

$ python
from SigProfilerMatrixGenerator import install as genInstall

To use SComatic on your bamFile.bam, having metadata.tsv file mapping cell barcodes to cell type and reference genome ref.fa used in alignment fro hg38 genome build, use this command

nextflow run iarcbioinfo/SComatic-nf --bam_folder bamFile.bam --meta metadata.tsv --ref ref.fa --scomat_path path/to/Scomatfolder --annovar_path path/to/annovar --hdb path/to/humandb --hg_build GRCh38


SplitBamCellTypes/sample.*.bamFolder containing cell-type-specific BAM files (step1 output)
Step2_BaseCellCounts/sample.*.tsvFolder containing base count information for each cell type and for every position in the genome (step2 output)
Step3_BaseCellCountsMerged/sample.BaseCellCounts.AllCellTypes.tsvFolder containing merged base count file of all cell types. (step3 output)
Step4_VariantCalling/sample.calling.step*.tsv (*=1,2)Folder containing two files files (1*.tsv: SNV called after applying filters for removing technical artefacts, 2*.tsv: Further filtered for RNA editing and PoN). (step4 output)
sigprofiler-input/sample_*.bamFolder containing input files for SigProfilerExtractor.
sigprofiler-results/*Folder containing result files and folders from SigProfilerExtractor.


Lipika KalsonDeveloper
Nicolas Alcalaalcalan@iarc.who.intDeveloper to contact for support


Muyas, F., Sauer, C.M., Valle-Inclán, J.E. et al. De novo detection of somatic mutations in high-throughput single-cell profiling data sets. Nat Biotechnol (2023). https://doi.org/10.1038/s41587-023-01863-z