Home

Awesome

Winnowmap

Winnowmap is a long-read mapping algorithm optimized for mapping ONT and PacBio reads to repetitive reference sequences. Winnowmap development began on top of minimap2 codebase, and since then we have incorporated the following two ideas to improve mapping accuracy within repeats.

Compile

Clone source code from master branch or download the latest release.

  git clone https://github.com/marbl/Winnowmap.git

Winnowmap compilation requires C++ compiler with c++11 and openmp, which are available by default in GCC >= 4.8.

  cd Winnowmap
  make -j8

Expect winnowmap and meryl executables in bin folder.

Usage

For either mapping long reads or computing whole-genome alignments, Winnowmap requires pre-computing high frequency k-mers (e.g., top 0.02% most frequent) in a reference. Winnowmap uses meryl k-mer counting tool for this purpose.

  meryl count k=15 output merylDB ref.fa
  meryl print greater-than distinct=0.9998 merylDB > repetitive_k15.txt

  winnowmap -W repetitive_k15.txt -ax map-ont ref.fa ont.fq.gz > output.sam  [OR]
  winnowmap -W repetitive_k15.txt -ax map-pb ref.fa hifi.fq.gz > output.sam
  meryl count k=19 output merylDB asm1.fa
  meryl print greater-than distinct=0.9998 merylDB > repetitive_k19.txt

  winnowmap -W repetitive_k19.txt -ax asm20 asm1.fa asm2.fa > output.sam

For the genome-to-genome use case, it may be useful to visualize the dot plot. This perl script can be used to generate a dot plot from paf-formatted output. In both usage cases, pre-computing repetitive k-mers using meryl is quite fast, e.g., it typically takes 2-3 minutes for the human genome reference.

Benchmarking

When comparing Winnowmap (v1.0) to minimap2 (v2.17-r954), we observed a reduction in the mapping error-rate from 0.14% to 0.06% in the recently finished human X chromosome, and from 3.6% to 0% within the highly repetitive X centromere (3.1 Mbp). Winnowmap improves mapping accuracy within repeats and achieves these results with sparser sampling, leading to better index compression and competitive runtimes. By avoiding masking, we show that Winnowmap maintains uniform minimizer density.

<p align="center"> <img src="https://i.postimg.cc/MKtqBYPn/readme-winnowmap-density.jpg" width=400px"> <br> Minimizer sampling density using a human X chromosome as the reference, with the centromere positioned between 58 Mbp and 61 Mbp. ‘Standard’ method refers to the classic minimizer sampling algorithm from <a href="http://www.cs.toronto.edu/~wayne/research/papers/minimizers.pdf">Roberts et al.</a>, without any masking or modification. </p>

Publications