Awesome
Differential Datalog (DDlog)
DDlog is a programming language for incremental computation. It is well suited for writing programs that continuously update their output in response to input changes. With DDlog, the programmer does not need to worry about writing incremental algorithms. Instead they specify the desired input-output mapping in a declarative manner, using a dialect of Datalog. The DDlog compiler then synthesizes an efficient incremental implementation. DDlog is based on Frank McSherry's excellent differential dataflow library.
DDlog has the following key properties:
-
Relational: A DDlog program transforms a set of input relations (or tables) into a set of output relations. It is thus well suited for applications that operate on relational data, ranging from real-time analytics to cloud management systems and static program analysis tools.
-
Dataflow-oriented: At runtime, a DDlog program accepts a stream of updates to input relations. Each update inserts, deletes, or modifies a subset of input records. DDlog responds to an input update by outputting an update to its output relations.
-
Incremental: DDlog processes input updates by performing the minimum amount of work necessary to compute changes to output relations. This has significant performance benefits for many queries.
-
Bottom-up: DDlog starts from a set of input facts and computes all possible derived facts by following user-defined rules, in a bottom-up fashion. In contrast, top-down engines are optimized to answer individual user queries without computing all possible facts ahead of time. For example, given a Datalog program that computes pairs of connected vertices in a graph, a bottom-up engine maintains the set of all such pairs. A top-down engine, on the other hand, is triggered by a user query to determine whether a pair of vertices is connected and handles the query by searching for a derivation chain back to ground facts. The bottom-up approach is preferable in applications where all derived facts must be computed ahead of time and in applications where the cost of initial computation is amortized across a large number of queries.
-
In-memory: DDlog stores and processes data in memory. In a typical use case, a DDlog program is used in conjunction with a persistent database, with database records being fed to DDlog as ground facts and the derived facts computed by DDlog being written back to the database.
At the moment, DDlog can only operate on databases that completely fit the memory of a single machine. We are working on a distributed version of DDlog that will be able to partition its state and computation across multiple machines.
-
Typed: In its classical textbook form Datalog is more of a mathematical formalism than a practical tool for programmers. In particular, pure Datalog does not have concepts like types, arithmetics, strings or functions. To facilitate writing of safe, clear, and concise code, DDlog extends pure Datalog with:
-
A powerful type system, including Booleans, unlimited precision integers, bitvectors, floating point numbers, strings, tuples, tagged unions, vectors, sets, and maps. All of these types can be stored in DDlog relations and manipulated by DDlog rules. Thus, with DDlog one can perform relational operations, such as joins, directly over structured data, without having to flatten it first (as is often done in SQL databases).
-
Standard integer, bitvector, and floating point arithmetic.
-
A simple procedural language that allows expressing many computations natively in DDlog without resorting to external functions.
-
String operations, including string concatenation and interpolation.
-
Syntactic sugar for writing imperative-style code using for/let/assignments.
-
-
Integrated: while DDlog programs can be run interactively via a command line interface, its primary use case is to integrate with other applications that require deductive database functionality. A DDlog program is compiled into a Rust library that can be linked against a Rust, C/C++, Java, or Go program (bindings for other languages can be easily added). This enables good performance, but somewhat limits the flexibility, as changes to the relational schema or rules require re-compilation.
Documentation
- Follow the tutorial for a step-by-step introduction to DDlog.
- DDlog language reference.
- DDlog command reference for writing and testing your own Datalog programs.
- How to use DDlog from Java.
- How to use DDlog from C.
- How to use DDlog from Go and Go API documentation.
- How to use DDlog from Rust (by example)
- Tutorial on profiling DDlog programs
- DDlog overview paper, Datalog 2.0 workshop, 2019.
Installation
Installing DDlog from a binary release
To install a precompiled version of DDlog, download the latest binary release, extract it from archive, add ddlog/bin
to your $PATH
, and set $DDLOG_HOME
to point to the ddlog
directory. You will also need to install the Rust toolchain (see instructions below).
If you're using OS X, you will need to override the binary's security settings through these instructions. Else, when first running the DDlog compiler (through calling ddlog
), you will get the following warning dialog:
"ddlog" cannot be opened because the developer cannot be verified.
macOS cannot verify that this app is free from malware.
You are now ready to start coding in DDlog.
Compiling DDlog from sources
Installing dependencies manually
- Haskell stack:
wget -qO- https://get.haskellstack.org/ | sh
- Rust toolchain v1.52.1 or later:
Note: Thecurl https://sh.rustup.rs -sSf | sh . $HOME/.cargo/env rustup component add rustfmt rustup component add clippy
rustup
script adds path to Rust toolchain binaries (typically,$HOME/.cargo/bin
) to~/.profile
, so that it becomes effective at the next login attempt. To configure your current shell runsource $HOME/.cargo/env
. - JDK, e.g.:
apt install default-jdk
- Google FlatBuffers library. Download and build FlatBuffers release 1.11.0 from
github. Make sure
that the
flatc
tool is in your$PATH
. Additionally, make sure that FlatBuffers Java classes are in your$CLASSPATH
:./tools/install-flatbuf.sh cd flatbuffers export CLASSPATH=`pwd`"/java":$CLASSPATH export PATH=`pwd`:$PATH cd ..
- Static versions of the following libraries:
libpthread.a
,libc.a
,libm.a
,librt.a
,libutil.a
,libdl.a
,libgmp.a
, andlibstdc++.a
can be installed from distro-specific packages. On Ubuntu:
On Fedora:apt install libc6-dev libgmp-dev
dnf install glibc-static gmp-static libstdc++-static
Building
To build the software once you've installed the dependencies using one of the
above methods, clone this repository and set $DDLOG_HOME
variable to point
to the root of the repository. Run
stack build
anywhere inside the repository to build the DDlog compiler. To install DDlog binaries in Haskell stack's default binary directory:
stack install
To install to a different location:
stack install --local-bin-path <custom_path>
To test basic DDlog functionality:
stack test --ta '-p path'
Note: this takes a few minutes
You are now ready to start coding in DDlog.
vim syntax highlighting
The easiest way to enable differential datalog syntax highlighting for .dl
files in Vim is by
creating a symlink from <ddlog-folder>/tools/vim/syntax/dl.vim
into ~/.vim/syntax/
.
If you are using a plugin manager you may be able to directly consume the file from the upstream
repository as well. In the case of Vundle
, for example,
configuration could look as follows:
call vundle#begin('~/.config/nvim/bundle')
...
Plugin 'vmware/differential-datalog', {'rtp': 'tools/vim'} <---- relevant line
...
call vundle#end()
Debugging with GHCi
To run the test suite with the GHCi debugger:
stack ghci --ghci-options -isrc --ghci-options -itest differential-datalog:differential-datalog-test
and type do main
in the command prompt.
Building with profiling info enabled
stack clean
followed by
stack build --profile
or
stack test --profile