Home

Awesome

Deep Retina 3th place solution to Kaggle's 2018 Data Science Bowl.

This solution is based on Matterport's Mask_RCNN implementation on keras/tensorflow. Please look at the original repository (https://github.com/matterport/Mask_RCNN) for specific details. This is an implementation of Mask R-CNN on Python 3, Keras, and TensorFlow. The model generates bounding boxes and segmentation masks for each instance of an object in the image. It's based on Feature Pyramid Network (FPN) and a ResNet101 backbone. I have used the pretrained COCO weights as the starting point for my training on the nuclei segmentation dataset.

Some of the Mask-RCNN files are modified, in particular model.py and utils.py.

Training

Data

I did not use any additional external data. Training was done with a modified version of the dataset as in the discussion forum: https://www.kaggle.com/c/data-science-bowl-2018/discussion/50518 and mentioned several times in the Official External Data thread as a link to: https://github.com/lopuhin/kaggle-dsbowl-2018-dataset-fixes For stage2 I didn't retrain my model with the released stage1_test images. File my_bowl_dataset.py generates the bowldataset class to upload the images and masks in the correct format.

Augmentation

The modifications of files model.py and utils.py are needed for scaling the training set images before random 512x512 crops are taken for training. This scaling that take places in the "resize_image" function in utils.py and also includes aspect ratio changes of the images. In addition, code in model.py inside the "augment" section of function "load_image_gt", allows for further image augmentation of the 512x512 crops, concretely:

Similar augmentations can be achieved with the imgaug library (see the code for an example), but I have found these slightly underperform compared to my implementation. Also note that my winning implementation contained a bug that I have removed from this source code, so the results of running the training schedule bellow vary slightly from my final implementation. The bug was in the cropping function in utils.py and resulted in my code always taking the same part of an image (after the random rescaling that was working ok). Surprisingly, this bug did not penalize much my score, although I would have gone up one or maybe two positions if I hadn't made this stupid error.....

Training schedule

Training takes place in three consecutive steps.

Step 1

For the first step we initialized the network with the COCO weights pretrained by Matterport (mentioned in the External data thread) and accessible at: https://github.com/matterport/Mask_RCNN/releases/download/v2.0/mask_rcnn_coco.h5 File "my_train_1.py" contains the code for the first training step. This is the most important part of the training, the other two steps don't improve much. Training doesn't have anything fancy, it just trains for 75 epochs, reducing the learning rate at epoch 30 and 50. Training uses the default SGD.

Step 2

File "my_train_2.py" contains the second step. In this step we initialize the network with the final epoch weights in step 1. We then train for 40 additional epochs. The difference with step 1 is how images are taken from the training set. In step 1 images are taken one at a time as in most training procedures. But it is easy to see that the types of images and the number of each type are quite different, this makes the training dataset quite unbalanced in regards to the type of images to classify. Because the training set had groups of images of different sizes and usually the smaller images are more abundant I thought that it would be good (given all the cropping done above) that for each epoch we use several times the same image, in particular for the larger images. In summary , the train set as provided has 9 image sizes, each with different number of images: (256, 320, 4) 111 (1040, 1388, 4) 1 (360, 360, 4) 90 (260, 347, 4) 3 (512, 640, 4) 13 (603, 1272, 4) 6 (256, 256, 4) 334 (1024, 1024, 4) 16 (520, 696, 4) 90 So I take 334 copies of the (1040,1388) image and add them to the training set, I take 1 copy of each image in the (256,256) category, 3 copies of each image in the (256,320) category and so on generating a training set that has the same number of images of each size. This makes my training set very large...

Step 3

File "my_train_3.py" contains the third step. The final third step takes the weights of the last epoch in Step 2 and repeats the same procedure for a further 5 epochs with slightly larger augmentation parameters. The resulting weights are what have been used for my winning solution. It should be noted, however, that Steps 2 and 3 do not result in significant improvement of the Step 2 predictions (it is easy to say now that we can see the scores... ;) ). The weights for my solution are too large for github, you can download them at : https://drive.google.com/file/d/19kVton20JL9u0CpwGssD7EbBvsWcq1ty/view?usp=sharing

Inference

Inference uses the "pad64" option of Matterport's Mask_RCNN implementation.

Mask R-CNN for Object Detection and Segmentation

This is an implementation of Mask R-CNN on Python 3, Keras, and TensorFlow. The model generates bounding boxes and segmentation masks for each instance of an object in the image. It's based on Feature Pyramid Network (FPN) and a ResNet101 backbone.

Instance Segmentation Sample

The repository includes:

The code is documented and designed to be easy to extend. If you use it in your research, please consider referencing this repository. If you work on 3D vision, you might find our recently released Matterport3D dataset useful as well. This dataset was created from 3D-reconstructed spaces captured by our customers who agreed to make them publicly available for academic use. You can see more examples here.

Getting Started

Step by Step Detection

To help with debugging and understanding the model, there are 3 notebooks (inspect_data.ipynb, inspect_model.ipynb, inspect_weights.ipynb) that provide a lot of visualizations and allow running the model step by step to inspect the output at each point. Here are a few examples:

1. Anchor sorting and filtering

Visualizes every step of the first stage Region Proposal Network and displays positive and negative anchors along with anchor box refinement.

2. Bounding Box Refinement

This is an example of final detection boxes (dotted lines) and the refinement applied to them (solid lines) in the second stage.

3. Mask Generation

Examples of generated masks. These then get scaled and placed on the image in the right location.

4.Layer activations

Often it's useful to inspect the activations at different layers to look for signs of trouble (all zeros or random noise).

5. Weight Histograms

Another useful debugging tool is to inspect the weight histograms. These are included in the inspect_weights.ipynb notebook.

6. Logging to TensorBoard

TensorBoard is another great debugging and visualization tool. The model is configured to log losses and save weights at the end of every epoch.

6. Composing the different pieces into a final result

Training on MS COCO

We're providing pre-trained weights for MS COCO to make it easier to start. You can use those weights as a starting point to train your own variation on the network. Training and evaluation code is in samples/coco/coco.py. You can import this module in Jupyter notebook (see the provided notebooks for examples) or you can run it directly from the command line as such:

# Train a new model starting from pre-trained COCO weights
python3 samples/coco/coco.py train --dataset=/path/to/coco/ --model=coco

# Train a new model starting from ImageNet weights
python3 samples/coco/coco.py train --dataset=/path/to/coco/ --model=imagenet

# Continue training a model that you had trained earlier
python3 samples/coco/coco.py train --dataset=/path/to/coco/ --model=/path/to/weights.h5

# Continue training the last model you trained. This will find
# the last trained weights in the model directory.
python3 samples/coco/coco.py train --dataset=/path/to/coco/ --model=last

You can also run the COCO evaluation code with:

# Run COCO evaluation on the last trained model
python3 samples/coco/coco.py evaluate --dataset=/path/to/coco/ --model=last

The training schedule, learning rate, and other parameters should be set in samples/coco/coco.py.

Training on Your Own Dataset

Start by reading this blog post about the balloon color splash sample. It covers the process starting from annotating images to training to using the results in a sample application.

In summary, to train the model on your own dataset you'll need to extend two classes:

Config This class contains the default configuration. Subclass it and modify the attributes you need to change.

Dataset This class provides a consistent way to work with any dataset. It allows you to use new datasets for training without having to change the code of the model. It also supports loading multiple datasets at the same time, which is useful if the objects you want to detect are not all available in one dataset.

See examples in samples/shapes/train_shapes.ipynb, samples/coco/coco.py, samples/balloon/balloon.py, and samples/nucleus/nucleus.py.

Differences from the Official Paper

This implementation follows the Mask RCNN paper for the most part, but there are a few cases where we deviated in favor of code simplicity and generalization. These are some of the differences we're aware of. If you encounter other differences, please do let us know.

Contributing

Contributions to this repository are welcome. Examples of things you can contribute:

You can also join our team and help us build even more projects like this one.

Requirements

Python 3.4, TensorFlow 1.3, Keras 2.0.8 and other common packages listed in requirements.txt.

MS COCO Requirements:

To train or test on MS COCO, you'll also need:

If you use Docker, the code has been verified to work on this Docker container.

Installation

  1. Install dependencies

    pip3 install -r requirements.txt
    
  2. Clone this repository

  3. Run setup from the repository root directory

    python3 setup.py install
    
  4. Download pre-trained COCO weights (mask_rcnn_coco.h5) from the releases page.

  5. (Optional) To train or test on MS COCO install pycocotools from one of these repos. They are forks of the original pycocotools with fixes for Python3 and Windows (the official repo doesn't seem to be active anymore).

Projects Using this Model

If you extend this model to other datasets or build projects that use it, we'd love to hear from you.

4K Video Demo by Karol Majek.

Mask RCNN on 4K Video

Images to OSM: Improve OpenStreetMap by adding baseball, soccer, tennis, football, and basketball fields.

Identify sport fields in satellite images

Splash of Color. A blog post explaining how to train this model from scratch and use it to implement a color splash effect.

Balloon Color Splash

Segmenting Nuclei in Microscopy Images. Built for the 2018 Data Science Bowl

Code is in the samples/nucleus directory.

Nucleus Segmentation

Mapping Challenge: Convert satellite imagery to maps for use by humanitarian organisations.

Mapping Challenge