Home

Awesome

MashMap

BioConda Install GitHub Downloads

MashMap implements a fast and approximate algorithm for computing local alignment boundaries between long DNA sequences. It can be useful for mapping genome assembly or long reads (PacBio/ONT) to reference genome(s). Given a minimum alignment length and an identity threshold for the desired local alignments, Mashmap computes alignment boundaries and identity estimates using k-mers. It does not compute the alignments explicitly, but rather estimates an unbiased k-mer based Jaccard similarity using a combination of minmers (a novel winnowing scheme) and MinHash. This is then converted to an estimate of sequence identity using the Mash distance. An appropriate k-mer sampling rate is automatically determined using the given minimum local alignment length and identity thresholds.

As an example, Mashmap can map a human genome assembly to the human reference genome in about one minute total execution time and < 4 GB memory using just 8 CPU threads, achieving more than an order of magnitude improvement in both runtime and memory over alternative methods. We describe the algorithms associated with Mashmap, and report on speed, scalability, and accuracy of the software in the publications listed below. Unlike traditional mappers, MashMap does not compute exact sequence alignments. In future, we plan to add an optional alignment support to generate base-to-base alignments.

MashMap3 important changes

Installation

Follow INSTALL.txt to compile and install MashMap. We also provide dependency-free linux and OSX binaries for user convenience through the latest release.

Usage

Parameters

For most of the use cases, default values should be appropriate. However, different parameters and their purpose can be checked using the help page mashmap -h. Important ones are mentioned below:

Visualize

We provide a perl script for generating dot-plots to visualize mappings. It takes Mashmap's mapping output as its input. This script requires availability of gnuplot. Below is an example demonstrating mapping of canu NA12878 human genome assembly (y-axis) to hg38 reference (x-axis).

<p align="center"> <img src="https://i.postimg.cc/HskJNzNg/readme-mashmap.jpg" height="300"/> </p>

Release

Use the latest release for a stable version.

<a name=“publications”></a>Publications