Awesome
Kedro Snowflake Pipelines plugin
<p align="center"> <a href="https://getindata.com/solutions/ml-platform-machine-learning-reliable-explainable-feature-engineering"><img height="150" src="https://getindata.com/img/logo.svg"></a> <h3 align="center">We help companies turn their data into assets</h3> </p>About
This plugin allows to run full Kedro pipelines in Snowflake. Right now it supports
- Kedro starter, to get you up to speed fast
- automatically creating Snowflake Stored Procedures from Kedro nodes (using Snowpark SDK)
- translating Kedro pipeline into Snowflake tasks graph
- running Kedro pipeline fully within Snowflake, without external system
- using Kedro's official
SnowparkTableDataSet
- automatically storing intermediate data as Transient Tables (if Snowpark's DataFrames are used)
- <span style="color:yellow;float:left;margin: 0px 7px 0px 0px">(New!)</span> MLflow integration with Snowflake with example usage in Snowflights Kedro starter
Documentation
For detailed documentation refer to https://kedro-snowflake.readthedocs.io/
Usage
With starter
-
Install the plugin
pip install "kedro-snowflake>=0.1.0"
-
Create new project with our Kedro starter ❄️ Snowflights 🚀:
<details> <summary>And answer the interactive prompts ⬇️ (click to expand) </summary>kedro new --starter=snowflights --checkout=master
</details>Project Name ============ Please enter a human readable name for your new project. Spaces, hyphens, and underscores are allowed. [Snowflights]: Snowflake Account ================= Please enter the name of your Snowflake account. This is the part of the URL before .snowflakecomputing.com []: abc-123 Snowflake User ============== Please enter the name of your Snowflake user. []: user2137 Snowflake Warehouse =================== Please enter the name of your Snowflake warehouse. []: compute-wh Snowflake Database ================== Please enter the name of your Snowflake database. [DEMO]: Snowflake Schema ================ Please enter the name of your Snowflake schema. [DEMO]: Snowflake Password Environment Variable ======================================= Please enter the name of the environment variable that contains your Snowflake password. Alternatively, you can re-configure the plugin later to use Kedros credentials.yml [SNOWFLAKE_PASSWORD]: Pipeline Name Used As A Snowflake Task Prefix ============================================= [default]: Enable Mlflow Integration (See Documentation For The Configuration Instructions) ================================================================================ [False]: The project name 'Snowflights' has been applied to: - The project title in /tmp/snowflights/README.md - The folder created for your project in /tmp/snowflights - The project's python package in /tmp/snowflights/src/snowflights
-
Run the project
cd snowflights kedro snowflake run --wait-for-completion
In existing Kedro project
- Install the plugin
pip install "kedro-snowflake>=0.1.0"
- Initialize the plugin
kedro snowflake init <ACCOUNT> <USER> <PASSWORD_FROM_ENV> <DATABASE> <SCHEMA> <WAREHOUSE>
- Run the project
kedro snowflake run --wait-for-completion
Kedro pipeline in Snowflake Tasks
<img src="./docs/images/kedro-snowflake-tasks-graph.png" alt="Kedro Snowflake Plugin" title="Kedro Snowflake Plugin" />Execution:
<img src="./docs/images/snowflake_running_pipeline.gif" alt="Kedro Snowflake Plugin CLI" title="Kedro Snowflake Plugin CLI" />