Home

Awesome

Kedro Snowflake Pipelines plugin

Python Version License SemVer PyPI version Downloads

Maintainability Rating Coverage Documentation Status

<p align="center"> <a href="https://getindata.com/solutions/ml-platform-machine-learning-reliable-explainable-feature-engineering"><img height="150" src="https://getindata.com/img/logo.svg"></a> <h3 align="center">We help companies turn their data into assets</h3> </p>

About

This plugin allows to run full Kedro pipelines in Snowflake. Right now it supports

Documentation

For detailed documentation refer to https://kedro-snowflake.readthedocs.io/

Usage

With starter

  1. Install the plugin

    pip install "kedro-snowflake>=0.1.0" 
    
  2. Create new project with our Kedro starter ❄️ Snowflights 🚀:

    kedro new --starter=snowflights --checkout=master
    
    <details> <summary>And answer the interactive prompts ⬇️ (click to expand) </summary>
    Project Name
    ============
    Please enter a human readable name for your new project.
    Spaces, hyphens, and underscores are allowed.
     [Snowflights]: 
    
    Snowflake Account
    =================
    Please enter the name of your Snowflake account.
    This is the part of the URL before .snowflakecomputing.com
     []: abc-123
    
    Snowflake User
    ==============
    Please enter the name of your Snowflake user.
     []: user2137
    
    Snowflake Warehouse
    ===================
    Please enter the name of your Snowflake warehouse.
     []: compute-wh
    
    Snowflake Database
    ==================
    Please enter the name of your Snowflake database.
     [DEMO]: 
    
    Snowflake Schema
    ================
    Please enter the name of your Snowflake schema.
     [DEMO]: 
    
    Snowflake Password Environment Variable
    =======================================
    Please enter the name of the environment variable that contains your Snowflake password.
    Alternatively, you can re-configure the plugin later to use Kedros credentials.yml
     [SNOWFLAKE_PASSWORD]:       
    
    Pipeline Name Used As A Snowflake Task Prefix
    =============================================
    
     [default]:
    
    Enable Mlflow Integration (See Documentation For The Configuration Instructions)
    ================================================================================
    
     [False]: 
    
    The project name 'Snowflights' has been applied to: 
    - The project title in /tmp/snowflights/README.md
    - The folder created for your project in /tmp/snowflights
    - The project's python package in /tmp/snowflights/src/snowflights
    
    </details>
  3. Run the project

    cd snowflights
    kedro snowflake run --wait-for-completion
    

In existing Kedro project

  1. Install the plugin
    pip install "kedro-snowflake>=0.1.0" 
    
  2. Initialize the plugin
    kedro snowflake init <ACCOUNT> <USER> <PASSWORD_FROM_ENV> <DATABASE> <SCHEMA> <WAREHOUSE>
    
  3. Run the project
    kedro snowflake run --wait-for-completion
    

Kedro pipeline in Snowflake Tasks

<img src="./docs/images/kedro-snowflake-tasks-graph.png" alt="Kedro Snowflake Plugin" title="Kedro Snowflake Plugin" />

Execution:

<img src="./docs/images/snowflake_running_pipeline.gif" alt="Kedro Snowflake Plugin CLI" title="Kedro Snowflake Plugin CLI" />