Home

Awesome

delta-kernel-rs

Delta-kernel-rs is an experimental Delta implementation focused on interoperability with a wide range of query engines. It currently supports reads and (experimental) writes. Only blind appends are currently supported in the write path.

The Delta Kernel project is a Rust and C library for building Delta connectors that can read and write Delta tables without needing to understand the Delta protocol details. This is the Rust/C equivalent of Java Delta Kernel.

Crates

Delta-kernel-rs is split into a few different crates:

Building

By default we build only the kernel and acceptance crates, which will also build derive-macros as a dependency.

To get started, install Rust via rustup, clone the repository, and then run:

cargo test --all-features

This will build the kernel, run all unit tests, fetch the Delta Acceptance Tests data and run the acceptance tests against it.

In general, you will want to depend on delta-kernel-rs by adding it as a dependency to your Cargo.toml, (that is, for rust projects using cargo) for other projects please see the FFI module. The core kernel includes facilities for reading and writing delta tables, and allows the consumer to implement their own Engine trait in order to build engine-specific implementations of the various Engine APIs that the kernel relies on (e.g. implement an engine-specific read_json_files() using the native engine JSON reader). If there is no need to implement the consumer's own Engine trait, the kernel has a feature flag to enable a default, asynchronous Engine implementation built with Arrow and Tokio.

# fewer dependencies, requires consumer to implement Engine trait.
# allows consumers to implement their own in-memory format
delta_kernel = "0.6"

# or turn on the default engine, based on arrow
delta_kernel = { version = "0.6", features = ["default-engine"] }

Feature flags

There are more feature flags in addition to the default-engine flag shown above. Relevant flags include:

Feature flagDescription
default-engineTurn on the 'default' engine: async, arrow-based Engine implementation
sync-engineTurn on the 'sync' engine: synchronous, arrow-based Engine implementation. Only supports local storage!
arrow-conversionConversion utilities for arrow/kernel schema interoperation
arrow-expressionExpression system implementation for arrow

Versions and Api Stability

We intend to follow Semantic Versioning. However, in the 0.x line, the APIs are still unstable. We therefore may break APIs within minor releases (that is, 0.1 -> 0.2), but we will not break APIs in patch releases (0.1.0 -> 0.1.1).

Arrow versioning

If you enable the default-engine or sync-engine features, you get an implemenation of the Engine trait that uses Arrow as its data format.

The arrow crate tends to release new major versions rather quickly. To enable engines that already integrate arrow to also integrate kernel and not force them to track a specific version of arrow that kernel depends on, we take as broad dependecy on arrow versions as we can.

This means you can force kernel to rely on the specific arrow version that your engine already uses, as long as it falls in that range. You can see the range in the Cargo.toml in the same folder as this README.md.

For example, although arrow 53.1.0 has been released, you can force kernel to compile on 53.0 by putting the following in your project's Cargo.toml:

[patch.crates-io]
arrow = "53.0"
arrow-arith = "53.0"
arrow-array = "53.0"
arrow-buffer = "53.0"
arrow-cast = "53.0"
arrow-data = "53.0"
arrow-ord = "53.0"
arrow-json = "53.0"
arrow-select = "53.0"
arrow-schema = "53.0"
parquet = "53.0"

Note that unfortunatly patching in cargo requires that exactly one version matches your specification. If only arrow "53.0.0" had been released the above will work, but if "53.0.1" where to be released, the specification will break and you will need to provide a more restrictive specification like "=53.0.0".

Object Store

You may also need to patch the object_store version used if the version of parquet you depend on depends on a different version of object_store. This can be done by including object_store in the patch list with the required version. You can find this out by checking the parquet docs.rs page, switching to the version you want to use, and then checking what version of object_store it depends on.

Documentation

Examples

There are some example programs showing how delta-kernel-rs can be used to interact with delta tables. They live in the kernel/examples directory.

Development

delta-kernel-rs is still under heavy development but follows conventions adopted by most Rust projects.

Concepts

There are a few key concepts that will help in understanding kernel:

  1. The Engine trait encapsulates all the functionality and engine or connector needs to provide to the Delta Kernel in order to read/write the Delta table.
  2. The DefaultEngine is our default implementation of the the above trait. It lives in engine/default, and provides a reference implementation for all Engine functionality. DefaultEngine uses arrow as its in-memory data format.
  3. A Scan is the entrypoint for reading data from a table.
  4. A Transaction is the entrypoint for writing data to a table.

Design Principles

Some design principles which should be considered:

Tips

{
  "editor.formatOnSave": true,
  "rust-analyzer.cargo.features": ["default-engine", "acceptance"]
}