Home

Awesome

DVID Picture

Go Report Card GoDoc

DVID is a Distributed, Versioned, Image-oriented Dataservice written to support neural reconstruction, analysis and visualization efforts at HHMI Janelia Research Center. It provides storage with branched versioning of a variety of data necessary for our research including teravoxel-scale image volumes, JSON descriptions of objects, sparse volumes, point annotations with relationships (like synapses), etc.

Its goal is to provide:

High-level architecture of DVID

How it's different from other forms of versioned data systems:

While much of the effort has been focused on the needs of the Janelia FlyEM Team, DVID can be used as a general-purpose branched versioning file system that handles billions of files and terabytes of data by creating instances of the keyvalue datatype. Our team uses the keyvalue datatype for branched versioning of JSON, configuration, and other files using the simple key-value HTTP API.

DVID aspires to be a "github for large-scale scientific data" because a variety of interrelated data (like image volume, labels, annotations, skeletons, meshes, and JSON data) can be versioned together. DVID currently handles branched versioning of large-scale data and does not provide domain-specific diff tools to compare data from versions, which would be a necessary step for user-friendly pull requests and truly collaborative data editing.

Table of Contents

Installation

Users should install DVID from the releases. The main branch of DVID may include breaking changes required by our research work.

Developers should consult the install README where our conda-based process is described.

DVID has been tested on MacOS X, Linux (Fedora 16, CentOS 6, Ubuntu), and Windows Subsystem for Linux (WSL2). It comes out-of-the-box with several embedded key-value databases (Badger, Basho's leveldb) for storage although you can configure other storage backends.

Before launching DVID, you'll have to create a configuration file describing ports, the types of storage engines, and where the data should be stored. Both simple and complex sample configuration files are provided in the scripts/distro-files directory.

Basic Usage

Some documentation is available on the DVID wiki's User Guide. When using DVID at scale, our team writes scripts using the neuclease python library. There are also other DVID access libraries used by our collaborators.

For simple scenarios like just using DVID for branched versioning of key-value data, reading and writing data can be done with a few simple HTTP requests.

More Information

Both high-level and detailed descriptions of DVID and its ecosystem can found here:

DVID is easily extensible by adding custom data types, each of which fulfill a minimal interface (e.g., HTTP request handling), DVID's initial focus is on efficiently handling data essential for Janelia's connectomics research:

Each of the above is handled by built-in data types via a Level 2 REST HTTP API implemented by Go language packages within the datatype directory. When dealing with novel data, we typically use the generic keyvalue datatype and store JSON-encoded or binary data until we understand the desired access patterns and API. When we outgrow the keyvalue type's GET, POST, and DELETE operations, we create a custom datatype package with a specialized HTTP API.

DVID allows you to assign data instances in a repo to different storage systems, which allows great flexibility in optimizing storage for particular use cases. For example, easily compressed label data can be store in fast, expensive SSDs while larger, immutable grayscale image data can be stored in petabyte-scale read-optimized systems like Google Cloud Storage.

DVID is written in Go and supports pluggable storage backends, a REST HTTP API, and command-line access (likely minimized in near future). Some components written in C, e.g., storage engines like Leveldb and fast codecs like lz4, are embedded or linked as a library.

Command-line and HTTP API documentation can be found in help constants within packages or by visiting the /api/help HTTP endpoint on a running DVID server.

Monitoring

Mutations and activity logging can be sent to a Kafka server. We use kafka activity topics to feed Kibana for analyzing DVID performance.

Snapshot of Kibana web page for DVID metrics

Known Clients with DVID Support

Programmatic clients:

GUI clients:

Screenshot of an early web app prototype pulling neuron data and 2d slices from 3d grayscale data:

Web app for 3d inspection being served from and sending requests to DVID