Home

Awesome

LakeHouse Sharing

Lakehouse is getting more and more popular nowadays, and many Oragnization seeing massive value in building and maintaining the lakehouse instead of maintaining the Warehouse technologies for various reasons. There are lot of great articles covering this topic in the internet or ask ChatGPT to know more about this area.

This repo in particular belongs to one such part of lakehouse ecosystem, where the concern is how to share data securely, within the org and outside of the organization. Heavily inspired, infact the complete idea is from Delta-sharing protocol. delta-sharing solves the data-sharing problem for the people using Delta Table format and Databricks (company behind Delta-lake) provides excellent self-service tools on top of open source delta-sharig

Motivation of this Repo:

Difference between normal querying and querying via delta-sharing

Installation

Run this commands in the root folder of this project

help:


Usage:
  make <target>

Targets:
  venv                           create a virtual environment for development
  start_backend_server           starts prefect server
  start_frontend_server          starts prefect agent
  help                           Show help

install requirements

make venv

To share iceberg table format install following extra package and setup catalog like AWS Glue or Hive, refer PyIceberg documentation

for iceberg

# install iceberg
pip install pyiceberg

To share delta-lake table format, install delta-lake package and delta-lake doesn't need any catalog it will directly fetch the metadata from table formats metadata in cloud storage files.

for delta-lake

pip install deltalake

start backend server

make start_backend_server

start Frontend server

In another termianl start frontend streamlit APP.

make start_frontend_server

Use docker setup

use docker setup to quickly setup the app

docker-compose up

Set few of the Environment variables before starting the docker-compose up refer .env.example file for setting the variables

APP urls:

Once docker-compose was up and running successfully, we can expect following urls

Frontend:

Lakehouse- sharing Architecture

architecture

Blog post

Refer the accompanied blog post for more details : https://guruengineering.substack.com/p/lakehouse-sharing

Video setup instructions

https://youtu.be/6H0qv-thogY

Code structure

.
├── Makefile
├── README.md
├── backend
│   ├── Dockerfile
│   ├── app
│   │   ├── README.md
│   │   ├── __init__.py
│   │   ├── __pycache__
│   │   ├── conf.py
│   │   ├── core
│   │   │   ├── __init__.py
│   │   │   ├── __pycache__
│   │   │   ├── base.py
│   │   │   ├── cloud
│   │   │   │   ├── __init__.py
│   │   │   │   ├── __pycache__
│   │   │   │   ├── aws.py
│   │   │   │   ├── azure.py
│   │   │   │   ├── base.py
│   │   │   │   └── gcs.py
│   │   │   ├── delta
│   │   │   │   ├── __init__.py
│   │   │   │   ├── __pycache__
│   │   │   │   ├── models.py
│   │   │   │   ├── share.py
│   │   │   │   └── utils.py
│   │   │   └── iceberg
│   │   │       ├── __init__.py
│   │   │       ├── __pycache__
│   │   │       ├── models.py
│   │   │       └── share.py
│   │   ├── db
│   │   │   ├── __init__.py
│   │   │   ├── __pycache__
│   │   │   ├── auth_queries.py
│   │   │   ├── queries.py
│   │   │   └── tables.py
│   │   ├── main.py
│   │   ├── models
│   │   │   ├── __init__.py
│   │   │   ├── __pycache__
│   │   │   ├── admin.py
│   │   │   ├── auth.py
│   │   │   ├── common.py
│   │   │   └── response.py
│   │   ├── routers
│   │   │   ├── __init__.py
│   │   │   ├── __pycache__
│   │   │   ├── admin.py
│   │   │   ├── auth.py
│   │   │   └── share.py
│   │   ├── securities
│   │   │   ├── __init__.py
│   │   │   ├── __pycache__
│   │   │   ├── jwt_utils.py
│   │   │   └── user_auth.py
│   │   ├── serverconf.yaml
│   │   └── utilities
│   │       ├── __init__.py
│   │       ├── __pycache__
│   │       ├── defaults.py
│   │       ├── exceptions.py
│   │       ├── pagination.py
│   │       ├── responses.py
│   │       └── validators.py
│   ├── requirements.txt
│   └── tests
│       ├── __init__.py
│       ├── __pycache__
│       ├── mock_results.py
│       └── test_share_apis.py
├── docker-compose.yaml
├── frontend
│   ├── Dockerfile
│   ├── README.md
│   ├── app
│   │   ├── __init__.py
│   │   ├── __pycache__
│   │   ├── core
│   │   │   ├── __init__.py
│   │   │   ├── __pycache__
│   │   │   ├── api
│   │   │   │   ├── __init__.py
│   │   │   │   ├── __pycache__
│   │   │   │   ├── config.py
│   │   │   │   ├── jwt_auth.py
│   │   │   │   └── rest.py
│   │   │   ├── base
│   │   │   │   ├── __init__.py
│   │   │   │   ├── __pycache__
│   │   │   │   ├── auth.py
│   │   │   │   ├── client.py
│   │   │   │   └── layout.py
│   │   │   ├── link.py
│   │   │   ├── login.py
│   │   │   ├── schema.py
│   │   │   ├── share.py
│   │   │   ├── table.py
│   │   │   ├── table_format.py
│   │   │   └── user.py
│   │   └── main.py
│   ├── config.yaml
│   └── requirements.txt
├── images
│   └── lakehouse-sharing-arch.png
├── notebooks
│   ├── client-example.ipynb
│   └── profile.json
└── sqls
    └── prepopulate_data.py

Roadmap:

Reference: