Awesome
<div align="center"> <a href="https://irmin.org"> <img src="./logo.svg" alt="Irmin logo"/> </a> <br /> <strong>A Distributed Database Built on the Same Principles as Git</strong> </div> <div align="center"> <br /> </div> <hr /> <div align="center"> <em> Irmin is an OCaml library for building mergeable, branchable distributed data stores. </em> </div> <hr />Irmin is based on distributed version-control systems (DVCs), extensively used in software development to track data provenance and show modifications in the source code. Irmin applies DVC's principles to large-scale distributed data and includes similar functions to Git (clone, push, pull, branch, rebase). The Git workflow was initially designed for humans to manage changes within source code. Irmin scales this to handle automatic programs performing a very high number of operations per second, with fully-automated conflict handling.
Irmin is highly customisable. Users can define their types to store application-specific values. They can also define custom storage layers (in memory, on disk, in a remote Redis database, in the browser, etc.). Finally, Irmin contains an event-driven API to define programmable dynamic behaviours and to program distributed dataflow pipelines.
Irmin was created at the University of Cambridge in 2013 to be the default storage layer for MirageOS applications (both to store and orchestrate unikernel binaries and the data that these unikernels are using). As such, Irmin is not, strictly speaking, a complete database engine. Instead, similarly to other MirageOS components, it is a collection of libraries designed to solve different flavours of the challenges raised by the CAP Theorem. Each application can select the right combination of libraries to solve its particular distributed problem.
Irmin is built on a core of well-defined, low-level data structures that dictate how data should be persisted and shared across nodes. It defines algorithms for efficient synchronisation of those distributed low-level constructs. It also builds a collection of higher-level data structures that developers can use without knowing precisely how Irmin works underneath. Some of these components even have formal semantics, including Conflict-free Replicated Data-Types (CRDT). Since it's a part of MirageOS, Irmin does not make strong assumptions about the OS environment, which makes the system very portable. It works well for in-memory databases and slower persistent serialisation, such as SSDs, hard drives, web browser local storage, or even the Git file format.
Irmin is primarily developed and maintained by Tarides, with involvement by contributors from various organisations. External maintainers and contributors are welcome.
<div class="toc"> </div>Features
- Built-In Snapshotting - backup and restore
- Storage Agnostic - use Irmin on top of your own storage layer
- Custom Datatypes - (de)serialisation for custom data types, derivable via
ppx_irmin
- Highly Portable - runs anywhere from Linux to web browsers and Xen unikernels
- Git Compatibility -
irmin-git
uses an on-disk format that can be inspected and modified using Git - Dynamic Behavior - allows the users to define custom merge functions, use in-memory transactions (to keep track of reads as well as writes), and to define event-driven workflows using a notification mechanism
Documentation
API documentation can be found online at https://mirage.github.io/irmin
Installation
Prerequisites
Please ensure to install the minimum opam
and ocaml
versions. Find the latest
version and install instructions on ocaml.org.
To install Irmin with the command-line tool and all Unix backends using opam
:
opam install irmin-cli
A minimal installation containing the reference in-memory backend can be installed by running:
<!-- $MDX skip --> opam install irmin
The following packages are available on opam
:
irmin
- the base package, plus an in-memory storage implementationirmin-chunk
- chunked storageirmin-cli
- a simple command-line toolirmin-fs
- filesystem-based storage usingbin_prot
irmin-git
- Git compatible storageirmin-graphql
- GraphQL serverirmin-mirage
- MirageOS compatibilityirmin-mirage-git
- Git compatible storage for MirageOSirmin-mirage-graphql
- MirageOS compatible GraphQL serverirmin-pack
- compressed, on-disk, POSIX backendppx_irmin
- PPX deriver for Irmin content types (see README_PPX.md)irmin-containers
- collection of simple, ready-to-use mergeable data structures
To install a specific package, simply run:
<!-- $MDX skip --> opam install <package-name>
Development Version
To install the development version of Irmin in your current opam switch
, clone
this repository and opam install
the packages inside:
git clone https://github.com/mirage/irmin
cd irmin/
opam install .
Usage
Example
Below is a simple example of setting a key and getting the value out of a Git-based, filesystem-backed store.
<!-- $MDX file=examples/readme.ml -->open Lwt.Syntax
(* Irmin store with string contents *)
module Store = Irmin_git_unix.FS.KV (Irmin.Contents.String)
(* Database configuration *)
let config = Irmin_git.config ~bare:true "/tmp/irmin/test"
(* Commit author *)
let author = "Example <example@example.com>"
(* Commit information *)
let info fmt = Irmin_git_unix.info ~author fmt
let main =
(* Open the repo *)
let* repo = Store.Repo.v config in
(* Load the main branch *)
let* t = Store.main repo in
(* Set key "foo/bar" to "testing 123" *)
let* () =
Store.set_exn t ~info:(info "Updating foo/bar") [ "foo"; "bar" ]
"testing 123"
in
(* Get key "foo/bar" and print it to stdout *)
let+ x = Store.get t [ "foo"; "bar" ] in
Printf.printf "foo/bar => '%s'\n" x
(* Run the program *)
let () = Lwt_main.run main
The example is contained in examples/readme.ml It can be compiled and executed with Dune:
<!-- $MDX skip -->$ dune build examples/readme.exe
$ dune exec examples/readme.exe
foo/bar => 'testing 123'
The examples directory also contains more advanced examples, which can be executed in the same way.
Command Line
The same thing can also be accomplished using irmin
, the command-line
application installed with irmin-cli
, by running:
$ echo "root: ." > irmin.yml
$ irmin init
$ irmin set foo/bar "testing 123"
$ irmin get foo/bar
testing 123
irmin.yml
allows for irmin
flags to be set on a per-directory basis. You
can also set flags globally using $HOME/.irmin/config.yml
. Run
irmin help irmin.yml
for further details.
Also see irmin --help
for a list of all commands and either
irmin <command> --help
or irmin help <command>
for more help with a
specific command.
Context
Irmin's initial design is directly inspired from XenStore, with:
- the need for efficient optimistic concurrency control features to let thousands of virtual machine concurrently access and modify a central configuration database (the Xen stack uses XenStore as an RPC mechanism to setup VM configuration on boot). Very early on, the initial focus was to specify and handle potential conflicts when the optimistic assumptions do not usually work so well.
- the need for a convenient way to debug and audit possible issues that might happen in that system. Our initial experiments showed that it was possible to design a reliable system using Git as backend to persist configuation data reliably (to safely restart after a crash), while making system debugging easy and go really fast, thanks to efficient merging strategy.
In 2014, the first release of Irmin was announced as part of the MirageOS 2.0 release. Since then, several projects started using and improving Irmin. These can roughly be split into three categories:
- Use Irmin as a portable, structured key-value store (with expressive, mergeable types)
- Use Irmin as distributed database (with a customisable consistency semantics)
- Use Irmin as an event-driven dataflow engine.
Irmin as a portable and efficient structured key-value store
- XenStored is an information storage space shared between all the Xen virtual machines running in the same host. Each virtual machine gets its own path in the store. When values are changed, the appropriate drivers are notified. The initial OCaml implementation was later extended to use Irmin. More details here.
- Jitsu is an experimental orchestrator for unikernels. It uses Irmin to store the unikernel configuration (and manage dynamic DNS entries). See more details here.
- Cuekeeper is a web-based GTD (a fancy TODO list) that runs entirely in the browser. It uses Irmin to store data locally with support for structured concurrent editing and snapshot export and import. More details here.
- Canopy and Unipi both use Irmin to serve static websites pulled from Git repositories and deployed as unikernels.
- Caldav uses Irmin to store calendar entries and back them into a Git repository. More information here.
- Datakit was developed at Docker and provided a 9p interface to the Irmin API. It was used to manage the configuration of Docker for Desktop with merge policies on upgrade, full auditing, and snapshot/rollback capabilites.
- Tezos started using Irmin in 2017
to store the
ledger state. The first prototype used
irmin-git
before switching toirmin-lmdb
andirmin-leveldb
(and nowirmin-pack
). More details here.
Irmin as a distributed store
- An IMAP server using Irmin to store emails. More details here. The goal of that project was both to use Irmin to store emails (so using Irmin as a local key-value store) but also to experiment with replacing the IMAP on-wire protocol by an explicit Git push/pull mechanism.
irmin-ARP
uses Irmin to store and audit ARP configuration. It's using Irmin as a local key-value store for very low-level information (which are normally stored very deep in the kernel layers), but the main goal was really to replace the broadcasting on-wire protocol by point-to-point pull/push synchronisation primitives, with a full audit log of ARP operations over a network. More details here.- Banyan uses Irmin to implement a distributed cache over a geo-replicated cluster. It's using Cassandra as a storage backend. More information here.
irmin-fdb
implements an Irmin store backed by FoundationDB. More details here.
Irmin as a dataflow scheduler
- Datakit CI is a
continuous integration service that monitors GitHub projects and
tests each branch, tag, and pull request. It displays the test
results as status indicators in the GitHub UI. It keeps all of its
state and logs in DataKit rather than a traditional relational
database, allowing review with the usual Git tools. The core of the
project is a scheduler that manages dataflow pipelines across Git
repositories. For a few years, it was used as Docker for Desktop's CI system test
on bare-metal and virtual machines, as well as
all the new opam package submissions to
ocaml/opam-repository
. More details here. - Causal RPC implements an RPC framework using Irmin as a network substrate. More details here.
- CISO is an experimental (distributed) Continuous Integration engine for opam. It was designed as a replacement of Datakit-CI and finally turned into OCurrent.
Issues
Feel free to report any issues using the GitHub bugtracker.
License
See the LICENSE file.
Acknowledgements
Development of Irmin was supported in part by the EU FP7 User-Centric Networking project, Grant No. 611001.