Awesome

kedro-starters-sklearn

This repository provides the following starter templates for Kedro 0.18.14.

sklearn-iris trains a Logistic Regression model using Scikit-learn.
sklearn-mlflow-iris adds experiment tracking feature using MLflow.

<img src="_doc_images/kedro_viz.png"> Pipeline visualized by Kedro-viz

`sklearn-iris` template

Iris dataset

Iris dataset is included and used in default.

Modification: for each species, setosa is encoded to 0, versicolor is encoded to 1, and virginica samples were removed.
Split: for each species, first 25 samples were included in train.csv, and last 25 samples were included in test.csv.

How to use

Install dependencies.

pip install 'kedro==0.18.14' pandas scikit-learn

Generate your Kedro starter project from sklearn-iris directory.
```
kedro new --starter https://github.com/Minyus/kedro-starters-sklearn.git --directory sklearn-iris
```
As explained by Kedro's documentaion, enter project_name, repo_name, and python_package.

Note: As your Python package name, choose a unique name and avoid a generic name such as "test" or "sklearn" used by another package. You can see the list of importable packages by running python -c "help('modules')".
Change the current directory to the generated project directory.
```
cd /path/to/project/directory
```
Run the project.
```
kedro run
```

Option to use Kaggle Titanic dataset

Download Kaggle Titanic dataset
Replace train.csv and test.csv in /path/to/project/directory/data/01_raw directory
Modify /path/to/project/directory/base/parameters.yml to set parameters appropriate for the dataset (commented out in default)

`sklearn-mlflow-iris` template

This template integrates MLflow to Kedro using PipelineX. Even without writing MLflow code. You can:

configure MLflow Tracking
log inputs and outputs of Python functions set up as Kedro nodes as parameters (e.g. features used to train the model) and metrics (e.g. F1 score).
log execution time for each Kedro node and DataSet loading/saving as metrics.
log artifacts (e.g. models, execution time Gantt Chart visualized by Plotly, parameters.yml file)

In this template, MLflow logging is configured in Python code at src/<python_package>/mlflow/mlflow_config.py

See here for details.

How to use

Install dependencies.

pip install 'kedro==0.18.14' pandas scikit-learn mlflow 'pipelinex>=0.7.7' plotly

Generate your Kedro starter project from sklearn-mlflow-iris directory.

kedro new --starter https://github.com/Minyus/kedro-starters-sklearn.git --directory sklearn-mlflow-iris

Follow the same steps as sklearn-iris template.

Access MLflow web UI

To access the MLflow web UI, launch the MLflow server.

mlflow server --host 127.0.0.1 --port 8080 --backend-store-uri sqlite:///mlruns/sqlite.db --default-artifact-root ./mlruns

<img src="_doc_images/mlflow_ui_metrics.png"> Logged metrics shown in MLflow's UI <img src="_doc_images/mlflow_ui_gantt.png"> Gantt chart for execution time, generated using Plotly, shown in MLflow's UI

Awesome

kedro-starters-sklearn

sklearn-iris template

Iris dataset

How to use

Option to use Kaggle Titanic dataset

sklearn-mlflow-iris template

How to use

Access MLflow web UI

`sklearn-iris` template

`sklearn-mlflow-iris` template