Home

Awesome

clean-code-ml

Now available as a free tutorial series: https://bit.ly/2yGDyqT 😎

Table of Contents

Introduction

Clean code practices (from Clean Code and Refactoring) adapted for machine learning / data science workflows in Python. This is not a style guide. It's a guide to producing readable, reusable, and refactorable software.

If you’ve tried your hand at machine learning or data science, you would know that code can get messy, quickly.

Unclean code adds to complexity by making code difficult to read and modify. As a consequence, changing code to respond to business needs becomes increasingly difficult, and sometimes even impossible. This has been written about extensively in several languages, and even in Python (e.g. Clean Code, Refactoring, clean-code-python). In this repo, we have adapted these principles for data science / machine learning codebases.

Targets Python3.7+

Inspired by clean-code-javascript and forked from clean-code-python.

The 5 S's of Clean Code

By James O Coplien (Source: Foreword of Clean Code (Robert C. Martin))

In about 1951, a quality approach called Total Productive Maintenance (TPM) came on the Japanese scene. Its focus is on maintenance rather than on production. One of the major pillars of TPM is the set of so-called 5S principles:

Hands-on Exercise

If you'd like to try out these practices, we've created a refactoring exercise which you can follow along. Starting with a jupyter notebook with many code smells, you can apply these clean code principles and refactor it to be readable and maintainable. The sample final solution can be found in src/train.py.