Home

Awesome

<p align="center"> <a href="https://delta.io/"> <img src="https://github.com/delta-io/delta-rs/blob/main/docs\delta-rust-no-whitespace.svg?raw=true" alt="delta-rs logo" height="200"> </a> </p> <p align="center"> A native Rust library for Delta Lake, with bindings to Python <br> <a href="https://delta-io.github.io/delta-rs/">Python docs</a> · <a href="https://docs.rs/deltalake/latest/deltalake/">Rust docs</a> · <a href="https://github.com/delta-io/delta-rs/issues/new?template=bug_report.md">Report a bug</a> · <a href="https://github.com/delta-io/delta-rs/issues/new?template=feature_request.md">Request a feature</a> · <a href="https://github.com/delta-io/delta-rs/issues/1128">Roadmap</a> <br> <br> <a href="https://pypi.python.org/pypi/deltalake"> <img alt="Deltalake" src="https://img.shields.io/pypi/l/deltalake.svg?style=flat-square&color=00ADD4&logo=apache"> </a> <a target="_blank" href="https://github.com/delta-io/delta-rs" style="background:none"> <img src="https://img.shields.io/github/stars/delta-io/delta-rs?logo=github&color=F75101"> </a> <a target="_blank" href="https://crates.io/crates/deltalake" style="background:none"> <img alt="Crate" src="https://img.shields.io/crates/v/deltalake.svg?style=flat-square&color=00ADD4&logo=rust" > </a> <a href="https://pypi.python.org/pypi/deltalake"> <img alt="Deltalake" src="https://img.shields.io/pypi/v/deltalake.svg?style=flat-square&color=F75101&logo=pypi" > </a> <a href="https://pypi.python.org/pypi/deltalake"> <img alt="Deltalake" src="https://img.shields.io/pypi/pyversions/deltalake.svg?style=flat-square&color=00ADD4&logo=python"> </a> <a target="_blank" href="https://go.delta.io/slack"> <img alt="#delta-rs in the Delta Lake Slack workspace" src="https://img.shields.io/badge/slack-delta-blue.svg?logo=slack&style=flat-square&color=F75101"> </a> </p> Delta Lake is an open-source storage format that runs on top of existing data lakes. Delta Lake is compatible with processing engines like Apache Spark and provides benefits such as ACID transaction guarantees, schema enforcement, and scalable data handling.

The Delta Lake project aims to unlock the power of the Deltalake for as many users and projects as possible by providing native low-level APIs aimed at developers and integrators, as well as a high-level operations API that lets you query, inspect, and operate your Delta Lake with ease.

SourceDownloadsInstallation CommandDocs
PyPiDownloadspip install deltalakeDocs
Crates.ioDownloadscargo add deltalakeDocs

Table of contents

Quick Start

The deltalake library aims to adopt patterns from other libraries in data processing, so getting started should look familiar.

from deltalake import DeltaTable, write_deltalake
import pandas as pd

# write some data into a delta table
df = pd.DataFrame({"id": [1, 2], "value": ["foo", "boo"]})
write_deltalake("./data/delta", df)

# Load data from the delta table
dt = DeltaTable("./data/delta")
df2 = dt.to_pandas()

assert df.equals(df2)

The same table can also be loaded using the core Rust crate:

use deltalake::{open_table, DeltaTableError};

#[tokio::main]
async fn main() -> Result<(), DeltaTableError> {
    // open the table written in python
    let table = open_table("./data/delta").await?;

    // show all active files in the table
    let files: Vec<_> = table.get_file_uris()?.collect();
    println!("{:?}", files);

    Ok(())
}

You can also try Delta Lake docker at DockerHub | Docker Repo

Get Involved

We encourage you to reach out, and are committed to provide a welcoming community.

Integrations

Libraries and frameworks that interoperate with delta-rs - in alphabetical order.

Features

The following section outlines some core features like supported storage backends and operations that can be performed against tables. The state of implementation of features outlined in the Delta protocol is also tracked.

Cloud Integrations

StorageRustPythonComment
Localdonedone
S3 - AWSdonedonerequires lock for concurrent writes
S3 - MinIOdonedoneNo lock required when using AmazonS3ConfigKey::ConditionalPut with storage_options = {"conditional_put":"etag"}
S3 - R2donedoneNo lock required when using AmazonS3ConfigKey::ConditionalPut with storage_options = {"conditional_put":"etag"}
Azure Blobdonedone
Azure ADLS Gen2donedone
Microsoft OneLakedonedone
Google Cloud Storagedonedone
HDFSdonedone

Supported Operations

OperationRustPythonDescription
CreatedonedoneCreate a new table
ReaddonedoneRead data from a table
VacuumdonedoneRemove unused files and log entries
Delete - partitionsdoneDelete a table partition
Delete - predicatesdonedoneDelete data based on a predicate
Optimize - compactiondonedoneHarmonize the size of data file
Optimize - Z-orderdonedonePlace similar data into the same file
MergedonedoneMerge a target Delta table with source data
FS checkdonedoneRemove corrupted files from table

Protocol Support Level

Writer VersionRequirementStatus
Version 2Append Only Tablesdone
Version 2Column Invariantsdone
Version 3Enforce delta.checkpoint.writeStatsAsJsonopen
Version 3Enforce delta.checkpoint.writeStatsAsStructopen
Version 3CHECK constraintssemi-done
Version 4Change Data Feed
Version 4Generated Columns
Version 5Column Mapping
Version 6Identity Columns
Version 7Table Features
Reader VersionRequirementStatus
Version 2Column Mapping
Version 3Table Features (requires reader V7)