Home

Awesome

ML Project Template

This repository contains an opinionated template project that can be easily adapted for all kinds of Machine Learning tasks. Typically, such project entails two main phases, research and production. The template intends to guide practitioners to adopt some best practices with regards to:


Repository Structure

Naming Conventions

Code Artifacts

Files

<dataset-desc>_<preprocessing-desc>_<training-desc>.<filetype>

Examples:

Name Identifier Descriptions:

<table> <tr> <th>Name</th> <th>Description</th> </tr> <tr> <td colspan="2"><b>Dataset Identifiers:</b></td> </tr> <tr> <td>categories2blogs</td> <td>Dataset containing blogs with the text content, blogs item URI, and connected primary tags.</td> </tr> <tr> <td>blogs-metadata</td> <td>Dataset containing all blogs and related metadata (properties).</td> </tr> <tr> <td colspan="2"><b>Preprocessing Identifiers:</b></td> </tr> <tr> <td>cl</td> <td>Default text cleaning (lowercasing, regex cleaning).</td> </tr> <tr> <td>rs</td> <td>Remove Stopwords.</td> </tr> <tr> <td>lm</td> <td>Text lemmatization.</td> </tr> <tr> <td colspan="2"><b>Training Identifiers:</b></td> </tr> <tr> <td>ft-vec</td> <td>Text vectorizer using Fasttext.</td> </tr> <tr> <td>tfidf</td> <td>Text vectorizer using TFIDF.</td> </tr> <tr> <td>lsvm</td> <td>Classifier using linear SVM.</td> </tr> <tr> <td colspan="2"><b>Filetype Identifiers:</b></td> </tr> <tr> <td>.model</td> <td>Model file.</td> </tr> <tr> <td>.vectors</td> <td>Binary vectors file.</td> </tr> </table>