Home

Awesome

Template for Data Science Project

This repo aims to give a robust starting point to any Data Science related project.

It contains readymade tools setup to start adding dependencies and coding.

To get yourself familiar with tools used here watch my talk on Data Science project setup (in Russian)

If you use this repo as a template - leave a star please because template usages don't count in Forks.

Workflow

Experiments and technology discovery are usualy performed on Jupyter Notebooks. For them notebooks directory is reserved. More info on working with Notebooks could be found in notebooks/README.md.

More mature part of pipeline (functions, classes, etc) are stored in .py files in main package directory (by default ds_project).

What to change?

How to setup an environment?

This template use poetry to manage dependencies of your project. They

First you need to install poetry.

Then if you use conda (recommended) to manage environments (to use regular virtualenvenv just skip this step):

Now you are ready to add dependencies to your project. For this use add command:

poetry add scikit-learn torch <any_package_you_need>

Next run poetry install to check your final state are even with configs.

After that add changes to git and commit them git add pyproject.toml poetry.lock

Finally add pre-commit hooks to git: pre-commit install

At this step you are ready to write clean reproducible code!

More tools