Home

Awesome

kedro-starters-sklearn

This repository provides the following starter templates for Kedro 0.18.14.

<p align="center"> <img src="_doc_images/kedro_viz.png"> Pipeline visualized by Kedro-viz </p>

sklearn-iris template

Iris dataset

Iris dataset is included and used in default.

How to use

  1. Install dependencies.

    pip install 'kedro==0.18.14' pandas scikit-learn 
    
  2. Generate your Kedro starter project from sklearn-iris directory.

    kedro new --starter https://github.com/Minyus/kedro-starters-sklearn.git --directory sklearn-iris
    

    As explained by Kedro's documentaion, enter project_name, repo_name, and python_package.

    Note: As your Python package name, choose a unique name and avoid a generic name such as "test" or "sklearn" used by another package. You can see the list of importable packages by running python -c "help('modules')".

  3. Change the current directory to the generated project directory.

    cd /path/to/project/directory
    
  4. Run the project.

    kedro run
    

Option to use Kaggle Titanic dataset

  1. Download Kaggle Titanic dataset
  2. Replace train.csv and test.csv in /path/to/project/directory/data/01_raw directory
  3. Modify /path/to/project/directory/base/parameters.yml to set parameters appropriate for the dataset (commented out in default)

sklearn-mlflow-iris template

This template integrates MLflow to Kedro using PipelineX. Even without writing MLflow code. You can:

In this template, MLflow logging is configured in Python code at src/<python_package>/mlflow/mlflow_config.py

See here for details.

How to use

  1. Install dependencies.

    pip install 'kedro==0.18.14' pandas scikit-learn mlflow 'pipelinex>=0.7.7' plotly
    
  2. Generate your Kedro starter project from sklearn-mlflow-iris directory.

    kedro new --starter https://github.com/Minyus/kedro-starters-sklearn.git --directory sklearn-mlflow-iris
    
  3. Follow the same steps as sklearn-iris template.

Access MLflow web UI

To access the MLflow web UI, launch the MLflow server.

mlflow server --host 127.0.0.1 --port 8080 --backend-store-uri sqlite:///mlruns/sqlite.db --default-artifact-root ./mlruns
<p align="center"> <img src="_doc_images/mlflow_ui_metrics.png"> Logged metrics shown in MLflow's UI </p> <p align="center"> <img src="_doc_images/mlflow_ui_gantt.png"> Gantt chart for execution time, generated using Plotly, shown in MLflow's UI </p>