Home

Awesome

<img src="https://docs.delta.io/latest/_static/delta-lake-white.png" width="200" alt="Delta Lake Logo"></img>

Test License PyPI PyPI - Downloads

Delta Lake is an open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs for Scala, Java, Rust, Ruby, and Python.

The following are some of the more popular Delta Lake integrations, refer to delta.io/integrations for the complete list:

<br/> <details> <summary><strong><em>Table of Contents</em></strong></summary> </details>

Latest Binaries

See the online documentation for the latest release.

API Documentation

Compatibility

Delta Standalone library is a single-node Java library that can be used to read from and write to Delta tables. Specifically, this library provides APIs to interact with a table’s metadata in the transaction log, implementing the Delta Transaction Log Protocol to achieve the transactional guarantees of the Delta Lake format.

API Compatibility

There are two types of APIs provided by the Delta Lake project.

Data Storage Compatibility

Delta Lake guarantees backward compatibility for all Delta Lake tables (i.e., newer versions of Delta Lake will always be able to read tables written by older versions of Delta Lake). However, we reserve the right to break forward compatibility as new features are introduced to the transaction protocol (i.e., an older version of Delta Lake may not be able to read a table produced by a newer version).

Breaking changes in the protocol are indicated by incrementing the minimum reader/writer version in the Protocol action.

Roadmap

Transaction Protocol

Delta Transaction Log Protocol document provides a specification of the transaction protocol.

Requirements for Underlying Storage Systems

Delta Lake ACID guarantees are predicated on the atomicity and durability guarantees of the storage system. Specifically, we require the storage system to provide the following.

  1. Atomic visibility: There must be a way for a file to be visible in its entirety or not visible at all.
  2. Mutual exclusion: Only one writer must be able to create (or rename) a file at the final destination.
  3. Consistent listing: Once a file has been written in a directory, all future listings for that directory must return that file.

See the online documentation on Storage Configuration for details.

Concurrency Control

Delta Lake ensures serializability for concurrent reads and writes. Please see Delta Lake Concurrency Control for more details.

Reporting issues

We use GitHub Issues to track community reported issues. You can also contact the community for getting answers.

Contributing

We welcome contributions to Delta Lake. See our CONTRIBUTING.md for more details.

We also adhere to the Delta Lake Code of Conduct.

Building

Delta Lake is compiled using SBT.

To compile, run

build/sbt compile

To generate artifacts, run

build/sbt package

To execute tests, run

build/sbt test

To execute a single test suite, run

build/sbt spark/'testOnly org.apache.spark.sql.delta.optimize.OptimizeCompactionSQLSuite'

To execute a single test within and a single test suite, run

build/sbt spark/'testOnly *.OptimizeCompactionSQLSuite -- -z "optimize command: on partitioned table - all partitions"'

Refer to SBT docs for more commands.

IntelliJ Setup

IntelliJ is the recommended IDE to use when developing Delta Lake. To import Delta Lake as a new project:

  1. Clone Delta Lake into, for example, ~/delta.
  2. In IntelliJ, select File > New Project > Project from Existing Sources... and select ~/delta.
  3. Under Import project from external model select sbt. Click Next.
  4. Under Project JDK specify a valid Java 1.8 JDK and opt to use SBT shell for project reload and builds.
  5. Click Finish.

Setup Verification

After waiting for IntelliJ to index, verify your setup by running a test suite in IntelliJ.

  1. Search for and open DeltaLogSuite
  2. Next to the class declaration, right click on the two green arrows and select Run 'DeltaLogSuite'

Troubleshooting

If you see errors of the form

Error:(46, 28) object DeltaSqlBaseParser is not a member of package io.delta.sql.parser
import io.delta.sql.parser.DeltaSqlBaseParser._
...
Error:(91, 22) not found: type DeltaSqlBaseParser
    val parser = new DeltaSqlBaseParser(tokenStream)

then follow these steps:

  1. Compile using the SBT CLI: build/sbt compile.
  2. Go to File > Project Structure... > Modules > delta-spark.
  3. In the right panel under Source Folders remove any target folders, e.g. target/scala-2.12/src_managed/main [generated]
  4. Click Apply and then re-run your test.

License

Apache License 2.0, see LICENSE.

Community

There are two mediums of communication within the Delta Lake community.