Awesome
Ruby Polars
:fire: Blazingly fast DataFrames for Ruby, powered by Polars
Installation
Add this line to your application’s Gemfile:
gem "polars-df"
Getting Started
This library follows the Polars Python API.
Polars.scan_csv("iris.csv")
.filter(Polars.col("sepal_length") > 5)
.group_by("species")
.agg(Polars.all.sum)
.collect
You can follow Polars tutorials and convert the code to Ruby in many cases. Feel free to open an issue if you run into problems.
Reference
Examples
Creating DataFrames
From a CSV
Polars.read_csv("file.csv")
# or lazily with
Polars.scan_csv("file.csv")
From Parquet
Polars.read_parquet("file.parquet")
# or lazily with
Polars.scan_parquet("file.parquet")
From Active Record
Polars.read_database(User.all)
# or
Polars.read_database("SELECT * FROM users")
From JSON
Polars.read_json("file.json")
# or
Polars.read_ndjson("file.ndjson")
# or lazily with
Polars.scan_ndjson("file.ndjson")
From Feather / Arrow IPC
Polars.read_ipc("file.arrow")
# or lazily with
Polars.scan_ipc("file.arrow")
From Avro
Polars.read_avro("file.avro")
From Delta Lake (requires deltalake-rb) [experimental, unreleased]
Polars.read_delta("./table")
# or lazily with
Polars.scan_delta("./table")
From a hash
Polars::DataFrame.new({
a: [1, 2, 3],
b: ["one", "two", "three"]
})
From an array of hashes
Polars::DataFrame.new([
{a: 1, b: "one"},
{a: 2, b: "two"},
{a: 3, b: "three"}
])
From an array of series
Polars::DataFrame.new([
Polars::Series.new("a", [1, 2, 3]),
Polars::Series.new("b", ["one", "two", "three"])
])
Attributes
Get number of rows
df.height
Get column names
df.columns
Check if a column exists
df.include?(name)
Selecting Data
Select a column
df["a"]
Select multiple columns
df[["a", "b"]]
Select first rows
df.head
Select last rows
df.tail
Filtering
Filter on a condition
df[Polars.col("a") == 2]
df[Polars.col("a") != 2]
df[Polars.col("a") > 2]
df[Polars.col("a") >= 2]
df[Polars.col("a") < 2]
df[Polars.col("a") <= 2]
And, or, and exclusive or
df[(Polars.col("a") > 1) & (Polars.col("b") == "two")] # and
df[(Polars.col("a") > 1) | (Polars.col("b") == "two")] # or
df[(Polars.col("a") > 1) ^ (Polars.col("b") == "two")] # xor
Operations
Basic operations
df["a"] + 5
df["a"] - 5
df["a"] * 5
df["a"] / 5
df["a"] % 5
df["a"] ** 2
df["a"].sqrt
df["a"].abs
Rounding
df["a"].round(2)
df["a"].ceil
df["a"].floor
Logarithm
df["a"].log # natural log
df["a"].log(10)
Exponentiation
df["a"].exp
Trigonometric functions
df["a"].sin
df["a"].cos
df["a"].tan
df["a"].asin
df["a"].acos
df["a"].atan
Hyperbolic functions
df["a"].sinh
df["a"].cosh
df["a"].tanh
df["a"].asinh
df["a"].acosh
df["a"].atanh
Summary statistics
df["a"].sum
df["a"].mean
df["a"].median
df["a"].quantile(0.90)
df["a"].min
df["a"].max
df["a"].std
df["a"].var
Grouping
Group
df.group_by("a").count
Works with all summary statistics
df.group_by("a").max
Multiple groups
df.group_by(["a", "b"]).count
Combining Data Frames
Add rows
df.vstack(other_df)
Add columns
df.hstack(other_df)
Inner join
df.join(other_df, on: "a")
Left join
df.join(other_df, on: "a", how: "left")
Encoding
One-hot encoding
df.to_dummies
Conversion
Array of hashes
df.rows(named: true)
Hash of series
df.to_h
CSV
df.to_csv
# or
df.write_csv("file.csv")
Parquet
df.write_parquet("file.parquet")
JSON
df.write_json("file.json")
# or
df.write_ndjson("file.ndjson")
Feather / Arrow IPC
df.write_ipc("file.arrow")
Avro
df.write_avro("file.avro")
Delta Lake [experimental, unreleased]
df.write_delta("./table")
Numo array
df.to_numo
Types
You can specify column types when creating a data frame
Polars::DataFrame.new(data, schema: {"a" => Polars::Int32, "b" => Polars::Float32})
Supported types are:
- boolean -
Boolean
- float -
Float64
,Float32
- integer -
Int64
,Int32
,Int16
,Int8
- unsigned integer -
UInt64
,UInt32
,UInt16
,UInt8
- string -
String
,Binary
,Categorical
- temporal -
Date
,Datetime
,Time
,Duration
- nested -
List
,Struct
,Array
- other -
Object
,Null
Get column types
df.schema
For a specific column
df["a"].dtype
Cast a column
df["a"].cast(Polars::Int32)
Visualization
Add Vega to your application’s Gemfile:
gem "vega"
And use:
df.plot("a", "b")
Specify the chart type (line
, pie
, column
, bar
, area
, or scatter
)
df.plot("a", "b", type: "pie")
Group data
df.group_by("c").plot("a", "b")
Stacked columns or bars
df.group_by("c").plot("a", "b", stacked: true)
History
View the changelog
Contributing
Everyone is encouraged to help improve this project. Here are a few ways you can help:
- Report bugs
- Fix bugs and submit pull requests
- Write, clarify, or fix documentation
- Suggest or add new features
To get started with development:
git clone https://github.com/ankane/ruby-polars.git
cd ruby-polars
bundle install
bundle exec rake compile
bundle exec rake test
bundle exec rake test:docs