Home

Awesome

<div align="center"> <img src="img/jasmine-logo.png" alt="jasmine logo" width="135px" align="center"/> <h1>Jasmine</h1> <p>Call select base modifications in PacBio HiFi reads</p> </div>

Jasmine calls select base modifications in PacBio HiFi reads based on polymerase kinetic signatures. Current models support 5-Methylcytosine (5mC) at CpG sites and N6-Methyladenine (6mA) for Fiber-seq. The 5mC caller assumes symmetric methylation status at the CpG site, and reports methylation on the read forward strand. The 6mA caller is per-strand.

Availability

Latest version can be installed via bioconda package pbjasmine.

Please refer to our official pbbioconda page for information on Installation, Support, License, Copyright, and Disclaimer.

Latest Version

Version 2.4.0: Full changelog here

Input Data

Input for jasmine are PacBio HiFi reads with kinetics. For more info see ccs.how:

Execution

Running jasmine is as simple as:

jasmine movie.hifi_reads.bam movie.methylation.hifi_reads.bam

Output Data

The output methylation prediction for each annotated HiFi read is encoded in the MM and ML tags, defined in the SAM tag specification. The MM tag specifies the modification and to which base it applies. The ML tag specifies the probability of methylation at each base.

The output is also described in the PacBio BAM file format documentation as

TagTypeDescription
MMZBase modifications / methylation
MLB,CBase modification probabilities

Notes for ML: The continuous probability range of 0.0 to 1.0 is remapped to the discrete integers 0 to 255 inclusively. The probability range corresponding to an integer N is N/256 to (N + 1)/256. The ML tag presents the probabilities in the order of modifications seen in the MM tag.

Example

Read  AGTCTAGACTCCGTAATTACTCGCCTAG...
C        1    2 34       5 6 78
CpG              *         *

MM:Z:C+m,3,1,...   # CpG sites are at C #4 (1+3) and #6 (1+3+1+1)
ML:B:C,249,4,...   # probability of methylation at the first CpG is in [249/256,250/256); second CpG is in [4/256,5/256).

Training datasets

HiFi reads and subreads for true negative and true positive CpG methylation sites are available at https://downloads.pacbcloud.com/public/Sequel-II-CpG-training/.

The true negatives are from HG002 Whole Genome Amplification (WGA). The true positives are from HG002 WGA + CpG Methyltransferase (M.Sssl).

Changelog