Home

Awesome

<evanbiederstedt> CRAN status CRAN downloads

edlibR

edlibR: R integration for edlib

The R package edlibR provides bindings to edlib, a performant C/C++ library for calculating the exact pairwise sequence alignment using the edit distance (Levenshtein distance). The package is modeled after the API of the Python package edlib.

The original Python bindings to the C/C++ library using Cython can be found here. Much of the background information below is directly from the original README for edlib.

Main Features

Usage

For details and examples of the edlibR functionality, see the vignettes:

There are three functions within edlibR:

align()

align(query, target, [mode], [task], [k], [cigarFormat], [additionalEqualities])

getNiceAlignment()

getNiceAlignment(alignResult, query, target, [gapSymbol])

nice_print()

nice_print(niceAlignment)

Background

Alignment Methods

Edlib supports three alignment methods:

Algorithmic Design

Edlib is based on Myers's bit-vector algorithm[1] and extends from it.

It calculates a dynamic programming matrix of dimensions Q x T, where Q is the length of the first sequence (query), and T is the length of the second sequence (target). It uses Ukkonen's banded algorithm[2] to reduce the space of search, and there is also parallelization from Myers's algorithm, however time complexity is still quadratic.

Edlib uses Hirschberg's algorithm[3] to find the alignment path, therefore space complexity is linear.

More details can be found within the edlib publication[4].

Installation

To install the stable version from CRAN, use:

install.packages('edlibR')

To install the latest version, use:

install.packages('devtools')
devtools::install_github('evanbiederstedt/edlibR')

References

R package citation

The R package can be cited as follows:

Martin Šošić and Evan Biederstedt (2022). edlibR: R wrapper to edlib,
the C/C++ library for fast exact sequence alignment using edit
(Levenshtein) distance. R package version 1.0.2.
https://github.com/evanbiederstedt/edlibR

edlib publication

Please also cite the original edlib publication, which can be found here and should be cited as follows:

Martin Šošić, Mile Šikić (2017). Edlib: a C/C++ library for fast, exact sequence alignment using edit distance. 
Bioinformatics, Volume 33, Issue 9, 1 May 2017, Pages 1394–1395, 
https://doi.org/10.1093/bioinformatics/btw753

README references

<a id="1">[1]</a> Myers G. (1999) A fast bit-vector algorithm for approximate string matching based on dynamic programming. J. ACM, 46, 395–415. https://doi.org/10.1145/316542.316550

<a id="2">[2]</a> Ukkonen E. (1985) Algorithms for approximate string matching. Inform. Control, 64, 100–118. https://doi.org/10.1016/S0019-9958(85)80046-2

<a id="3">[3]</a> Hirschberg D.S. (1975) A linear space algorithm for computing maximal common subsequences. Commun. ACM, 18, 341–343. https://doi.org/10.1145/360825.360861

<a id="4">[4]</a> Martin Šošić, Mile Šikić. Edlib: a C/C++ library for fast, exact sequence alignment using edit distance, Bioinformatics, Volume 33, Issue 9, 1 May 2017, Pages 1394–1395, https://doi.org/10.1093/bioinformatics/btw753