Awesome
Machine Learning
This repo contains a compilation of machine learning projects in the form of Jupyter notebooks. For some notebooks additional data, such as bounding box annotation files are needed, these files can be found in the data folder. Pytorch is used as the underlying library for projects involving deep learning.
mltools
Library
This is a Python library which contains useful classes and functions for machine learning and data science tasks, such as feature exploration, object detection and classification as well as semantic segmentation using Pytorch.
How to open notebooks using Docker
Requirements: Docker, docker-compose
The repo provides a Dockerfile and docker-compose.yml to create a Docker container that starts a Jupyter Notebook server (using docker-stacks) and allows you to open the notebooks without having to install the requirements on your system. The steps to do this are:
- Clone the repo:
git clone https://github.com/mfl28/MachineLearning.git cd MachineLearning
- Build the image and start the container using
docker-compose up
. - Copy the URL shown in the terminal to your browser's address bar and replace the internal port (
8888
) with the mapped host port10000
. - When you are done, you can shut down the server from the terminal using
CTRL-C
and remove the created Docker container usingdocker-compose down.
Notebooks
Semantic Segmentation
Kaggle Competition: Dstl Satellite Imagery Feature Detection (notebook, , )
<p align=left> <img src="demo-media/satellite_demo1.png" height= "150" /> <img src="demo-media/satellite_demo2.png" height= "150" /> </p>A notebook showing how to perform semantic segmentation using a fully convolutional neural network. Our aim is to locate buildings in satellite images from the Kaggle Dstl Satellite Imagery Feature Detection Challenge.
Object Detection
Humpback Whale Fluke Detection (notebook, , )
<p align=left> <img src="demo-media/whale_demo.png" height= "200" /> </p>A notebook showing how to perform object detection with a custom dataset using a pre-trained and subsequently fine-tuned neural network. Specifically, the aim is to detect and locate humpback whale flukes in images from the Kaggle Humpback Whale Identification Challenge. The ground truth bounding box labels for a selection of 800 images from the training dataset provided by the challenge were created using Bounding Box Editor.
VOCXMLDataset Demo (notebook, )
<p align=left> <img src="demo-media/voc_demo.png" height= "250" /> </p>A notebook showcasing the use of the VOCXMLDataset
class from mltools.detection.datasets
using images and annotations from the VOC2012 dataset for demonstrations.
Classification
Kaggle Competition: Humpback Whale Identification (notebook, , )
In this notebook we'll train a classifier to identify humpback whales in images according to the Kaggle Humpback Whale Identification Challenge. We'll use the fast.ai deep learning library to perform this task.
Kaggle Competition: MNIST Digit Recognizer (notebook, )
<p align=left> <img src="demo-media/mnist_demo.png" height= "200" /> </p>A notebook showing how to train a convolutional neural network object classifier for the MNIST Dataset from the Kaggle MNIST Digit Recognizer competition. The aim is to predict hand-drawn digits in images as accurately as possible.
Kaggle Competition: Titanic - Machine Learning from Disaster (notebook, )
<p align=left> <img src="demo-media/titanic_demo.jpg" height= "200" /> </p>The aim of this notebook is to build a model which can predict the survival of passengers of the Titanic. Problem and data come from the Kaggle Titanic: Machine Learning from Disaster competition. We start with an exploration and visualization of the provided features, then proceed to building a feature engineering Pipeline using scikit-learn. Finally we'll experiment with several machine learning approaches to solve the prediction problem.