Awesome
Awesome Kedro
<div align="center"><img width="500" src="kedro_banner.png" alt="kedro logo"></div>An opinionated Python framework for creating reproducible, maintainable and modular data science code.
This is an open-source repository to collect together anything related to Kedro such as blog posts, example projects, plugins, videos, and more.
Got something to include? Add your own work to the relevant section with a PR.
Contents
Awards and highlights
- Kedro won Best Technical Tool or Framework for AI in the 2019 Awards AI competition
- Kedro documentation won a merit award in the UK Technical Communication Awards 2020, and was overall winner in the UK Technical Communication Awards 2023.
- The Kedro framework is listed on the 2020 ThoughtWorks Technology Radar and the 2020 Data & AI Landscape.
- Kedro has received an honorable mention in the User Experience category in Fast Company’s 2022 Innovation by Design Awards.
Blog posts
In no particular order:
- Official Kedro blog
- Building and Managing Data Science Pipelines with Kedro
- Deploying Kedro Pipelines to Apache Airflow
- Writing your first kedro Nodes
- Setting Parameters in kedro
- Add New Dependencies to Your Kedro Project
- Running your Kedro Pipeline from the command line
- Kedro Virtual Environment
- Kedro Pipeline Create
- Kedro Install
- Kedro Git Init
- Kedro New
- What is Kedro
- How I Kedro
- Incremental Versioned Datasets in Kedro
- Productionizing ML Pipelines with Airflow, Kedro, and Great Expectations
- Change Data Capture With Kedro and Dolt
- Applying data engineering to applications with Kedro
- Running Machine Learning Pipelines with Kedro, Kubeflow and Airflow
- Introducing Kedro: Yetunde Dada, Principal Product Manager at QuantumBlack
- Standardization of End-to-End Data Pipeline for AI Project Using Kedro
- Using Kedro pipelines to train Amazon SageMaker models
- Kedro 6 Months In
- Jungle Scout case study: Kedro, Airflow, and MLFlow use on production code
- Building a Production-Level Data Pipeline Using Kedro
- Designing a "Router" for kedro
- Power is nothing without control
- Start small and grow big MLOps2020
- Get Started with Machine Learning Pipelines at Kedro
- Mid Meet Py - Ep.14 - Interview with Waylon Walker
- How to find datasets in your kedro catalog
- How Kedro handles your inputs
- Post mortem debugging sessions with Kedro hooks
- Start small and grow big MLOps2020
- Create Configurable Kedro Hooks
- What's an example use case of Kedro?
- Make Notebook Pipeline with Kedro+Papermill
- 25 Hot New Data Tools and What They DON’T do
- Kedro Hooks Intro - creating the kedro-preflight hook
- Next Generation Data Science and Data Engineering Frameworks
- Understanding best-practice Python tooling by comparing popular project templates
- A story using the Kedro pipeline library
- Transparent data flow with Kedro
- Comparison of Python pipeline packages: Airflow, Luigi, Metaflow, Kedro & PipelineX
- Kedro in Jupyter Notebooks On Google GCP Dataproc
- Building a Pipeline with Kedro for an ML Competition
- Using Kedro and MLflow Deploying and versioning data pipelines at scale
- Ship Faster With An Opinionated Data Pipeline Framework Episode** 100
- Some cool open-source Python packages for Machine Learning
- Kedro: A New Tool For Data Science
- Standardization of End-to-End Data Pipeline for AI Project Using Kedro
- The latest and greatest in Kedro — We’re growing our community
- Kedro-Airflow 0.4.0 — Orchestrating Kedro Pipelines with Airflow
- Beyond the Notebook and into the Data Science Framework Revolution
- Element AI uses Kedro to apply research and develop enterprise AI models
- Introducing Kedro Hooks
- Getting Started with Kedro
- Introducing Kedro
- Deploying and Versioning Data Pipelines at Scale
- Kedro hands-on Build your own demographics atlas. Pt. 2: building footprints classification
- Kedro (Python template for production-quality ML data pipelines)
- Enhance your kedro experiences with these tips
- Kedro: The Best Python Framework for Data Science!!
- kedro-in-6-months
- Deploying a Recommendation System the Kedro Way
- Efficient Data Sharing in Data Science Pipelines on Kubernetes
- Deep Learning with Azure: PyTorch distributed training done right in Kedro
- Running Machine Learning Pipelines with Kedro, Kubeflow and Airflow
- Running Kedro… everywhere? Machine Learning Pipelines on Kubeflow, Vertex AI, Azure and Airflow
For more:
- #kedro tag on dev.to
Companies using Kedro
There are Kedro users across the world, who work at start-ups, major enterprises and academic institutions like Absa, Acensi, Advanced Programming Solutions SL, AI Singapore, AMAI GmbH, Anacision GmbH, Augment Partners, AXA UK, Belfius, Beamery, Caterpillar, CRIM, Dendra Systems, Element AI, GetInData, GMO, Indicium, Imperial College London, ING, Jungle Scout, Helvetas, Leapfrog, McKinsey & Company, Mercado Libre Argentina, Modec, Mosaic Data Science, NaranjaX, NASA, NHS AI Lab, Open Data Science LatAm, Prediqt, Prospect, QuantumBlack, ReSpo.Vision, Retrieva, Roche, Sber, Société Générale, Telkomsel, Universidad Rey Juan Carlos, UrbanLogiq, Wildlife Studios, WovenLight and XP.
Example projects
- Churn Prediction with Kedro by Laíza Parizotto, a project that tackles a data science challenge of predicting customer churn for a fictional financial institution, using Kedro to build an effective pipeline for a production-ready machine learning model.
- Response Recommendation System for BarefootLaw by Kasun Amarasinghe, Carlos Caro, Nupoor Gandhi and Raphaelle Roffo, an extensive Data Science for Social Good (DSSG) at Imperial College London project that recommends responses to law related queries
- Augury by Craig Franklin, machine-learning functionality for predicting AFL match results in the Tipresias app
- CausalLift by Yusuke Minami, a Python package for Uplift Modeling in real-world business
- PipelineX by Yusuke Minami, a Python package to develop pipelines for rapid Machine/Deep Learning experimentation using Kedro and MLflow. Example projects using PyTorch, Pandas, and OpenCV are available.
- kedro-mlflow-example by Tom Goldenberg, a project that demonstrates how to integrate MLflow with a Kedro codebase
- kedro-wdbc-tf by Abhinav Prakash, this project uses a Kedro template to create Deep Learning workflow. The model training was done with TensorFlow against the wdbc (Breast Cancer) dataset.
- twitter-sentiment-analysis by Avi Agarwal, a project that demonstrates how to use Kedro to train and evaluate an NLP-based machine learning model.
- Anomaly Detection Pipeline with Kedro by Kenneth Leung, a project that demonstrates how to use Kedro for fraud detection on credit card transaction data using an Isolation Forest machine learning model.
- FontR by Maciej M, a project implementing Adobe's DeepFont research in Kedro to perform font recognition.
- Spaceship-Titanic using Kedro and MLflow by Mauricio Araujo, Spaceship-Titanic Kaggle competition with a fully automated machine learning lifecycle using Kedro and MLflow.
- pipelinex_image_processing by Minyus, a Kedro pipeline for image processing using OpenCV, Scikit-image, TensorFlow/Keras, and MLflow.
- Price Prediction Pipeline by Pedro Alves, a data processing and data science pipeline for a ficticious Diamond enterprise using Scikit-Learn, Docker and Pandas.
- Healthcare Data Analysis with Kedro by Pablo Villar, a end to end project through kedro pipelines (preprocessing, processing and data_science) with an app made in Streamlit where you can interact with the data.
- Spaceflights price prediction as a service and Monte Carlo simulations by Takieddine Kadiri, a kedro projects that utilize Kedro Boot to serve the spaceflights price prediction model through Rest API (FatApi) and Data App (Streamlit). It also provides an example of a Monte Carlo simulations for estimating Pi using Kedro Boot.
Kedro plugins
- find-kedro - Automatically construct pipelines using pytest style pattern matching.
- kedro-accelerator - Speeds up pipelines by parallelizing I/O in the background.
- kedro-airflow - Makes it easy to deploy Kedro projects to Airflow.
- kedro-airflow-k8s - Enables running a Kedro pipeline with Airflow on a Kubernetes cluster.
- kedro-argo - Converts Kedro pipelines to Argo pipelines.
- kedro-auto-catalog - A configurable replacement for kedro catalog create that allows you to create default dataset types other than
MemoryDataset
. - kedro-azureml - Enables running a Kedro pipeline with Azure ML Pipelines service.
- kedro-dataframe-dropin - Lets you swap out pandas datasets for modin or RAPIDs equivalents for specialised use to speed up your workflows (e.g on GPUs).
- kedro-datasets - A collection of Kedro data connectors.
- kedro-docker - Makes it easy to package Kedro projects with Docker.
- kedro-dolt - Allows you to expand the data versioning abilities of data scientists and engineers
- kedro-fast-api - kedro fast-api is a kedro plugin to easily create a fast-api for a kedro project for models' deployment.
- kedro-great - The easiest way to integrate Kedro and Great Expectations.
- kedro-grpc-server - Creates a gRPC server for your kedro pipelines.
- kedro-kubeflow - Lets you run and schedule pipelines on Kubernetes clusters using Kubeflow Pipelines.
- kedro-mlflow - Allows usage of MLFlow in Kedro projects.
- kedro-neptune - Integration of Kedro with Neptune.ai.
- kedro-pandas-profiling - "Profiles" data in the catalog. (⚠️public archive)
- kedro-pandera - Integration of Kedro with Pandera to provide catalog-level data validation.
- kedro-partitioned - Extends the functionality on processing partitioned data.
- kedro-sagemaker - Enables running a Kedro pipeline with Amazon SageMaker service.
- kedro-snowflake - Enables to run full Kedro pipelines in Snowflake.
- kedro-softfail-runner - Custom Kedro Runner to enable soft-failing pipeline.
- kedro-static-viz - Generates a static Kedro-Viz site (HTML, CSS, JS)
- kedro-viz - Helps visualise Kedro data and analytics pipelines.
- kedro-vertexai - Enables running a Kedro pipeline with Vertex AI Pipelines service.
- kedro-wings - Automatically creates catalog entries to simplify Kedro pipeline writing.- more-kedro - (Hook) library for on the fly typing and validation of parameter dictionaries and default value backed data loading.
- steel-toes - Prevent changing downstream catalog data on your teammates while developing in parallel.
- vineyard-kedro - Custom
DataSet
andRunner
which enables sharing intermediate data between tasks in Kedro pipelines using Vineyard, a cloud-native in-memory object manager. - kedro-tf-image - Kedro pipelines for preprocessing images using TensorFlow.
- kedro-graphql - A Kedro plugin for serving Kedro projects as GraphQL APIs.
- kedro-boot - Integrate you kedro project with any application
- kedro-popmon - A Kedro plugin for integration of popmon capabilities.
- kedro-expectations - Adding Data Validation to Kedro pipelines with up-to-date Great Expectations version.
For more:
- kedro-plugin topic on GitHub
Videos
Intros
- What is Kedro? Why is it useful? A Non-Technical Intro to Kedro - An intro for management people.
- PyConUS 20201 - Reproducible and maintainable data science code with Kedro
- Principled Data Science Workflows
- Production-level data pipelines that make everyone happy using Kedro
- Kedro - Nubank ML Meetup (Portuguese)
- Data Science Best Practices con Kedro (Spanish)
Howtos
- @kedro-python on YouTube
- Refactor your Jupyter notebooks using Kedro
- Introduction to Kedro training with Joel Schwarzmann
- Creating Shared Catalogs for your Kedro Projects on GitHub
- Deployable REST Enabled Data Pipelines with Flask, Docker, Kedro
- How to begin writing tests for your Pipelines
- How To Customize Your Kedro CLI Options
- How to Get/Write Data from/to a SQL Database - Use
pandas.SQLTableDataSet
orpandas.SQLQueryDataSet
. - How to Lazily Evaluate Chunks of a Big Pandas DataFrame
- How to Setup PySpark for your Kedro Pipeline
- Kedro Great: Use Great Expectations with Ease! - Show how to use kedro-great to e.g. validate data container meta data (columns, etc.).
- Run machine learning pipelines on ❄ Snowflake using Kedro 🔧 MLOPS TUTORIAL
- Kedro + PyTorch. MLOps TUTORIAL by Marcin Zabłocki
- Kedro + AWS SageMaker TUTORIAL
- How to run Kedro pipelines on Azure ML Pipelines service? - MLOPS TUTORIAL - Marcin Zabłocki
- Kedro Community Update - April 2023 - Kedro 0.18.7, new
OmegaConfigLoader
, experiment tracking in Kedro Viz, improvements in Databricks workflow, and more. - Let's look at Kedro 0.17.0!
- Kedro 0.16.0 was just Released! - Release notes (features) of Kedro 0.16.0 explained.
- Why transition from vanilla Jupyter notebooks to Kedro?