Awesome
delta-ruby
Delta Lake for Ruby
Supports local files and Amazon S3
Installation
Add this line to your application’s Gemfile:
gem "deltalake-rb"
It can take 5-10 minutes to compile the gem.
Getting Started
Write data
df = Polars::DataFrame.new({"id" => [1, 2], "value" => [3.0, 4.0]})
DeltaLake.write("./events", df)
Load a table
dt = DeltaLake::Table.new("./events")
df = dt.to_polars
Get a lazy frame
lf = dt.to_polars(eager: false)
Append rows
DeltaLake.write("./events", df, mode: "append")
Overwrite a table
DeltaLake.write("./events", df, mode: "overwrite")
Add a constraint
dt.alter.add_constraint({"id_gt_0" => "id > 0"})
Drop a constraint
dt.alter.drop_constraint("id_gt_0")
Delete rows
dt.delete("id > 1")
Vacuum
dt.vacuum(dry_run: false)
Perform small file compaction
dt.optimize.compact
Colocate similar data in the same files
dt.optimize.z_order(["category"])
Load a previous version of a table
dt = DeltaLake::Table.new("./events", version: 1)
# or
dt.load_as_version(1)
Get the schema
dt.schema
Get metadata
dt.metadata
Get history
dt.history
API
This library follows the Delta Lake Python API (with a few changes to make it more Ruby-like). You can follow Python tutorials and convert the code to Ruby in many cases. Feel free to open an issue if you run into problems.
History
View the changelog
Contributing
Everyone is encouraged to help improve this project. Here are a few ways you can help:
- Report bugs
- Fix bugs and submit pull requests
- Write, clarify, or fix documentation
- Suggest or add new features
To get started with development:
git clone https://github.com/ankane/delta-ruby.git
cd delta-ruby
bundle install
bundle exec rake compile
bundle exec rake test