Home

Awesome

FCNN Example

This was an exercise for a job application (of which I will not disclose the company name).

The goal was overfit to the given single image, a large aerial image along with ground truth, so that we can detect houses.

The network architecture was given, so no flexibility there.

This repo contains PyTorch code and other material to decribe my approach to the given problem.

Note that custom cross entropy function is in fact unnecessary, and will be removed in the next version.

Repo Structure

Performance Metrics

Initially, only "classification accuracy" (or "accuracy" for short) was used, since the problem was implemented a binary classification task. Shortly after, I've realized that the data is highly unbalanced (~91% background vs. ~9% houses), e.g. a result with no detection at all would result in 91% accuracy! (But I have kept this metric nonetheless, to compare with earlier attempts)

Therefore, if we frame the problem as a "house detection" problem, we can use metrics like Precision, Recall and F1 Score. These would enable us to properly evaluate the model performance, and to compare hyperparameters.

I have also considered IoU metric, but decided that would be counterintuitive for semantic segmentation and too much of a hassle for a 1 week exercise.

Unbalanced data --> Class Weights in Loss Function

As mentioned earlier, sample data is highly unbalanced, where 91% of the pixels are background whereas only 9% are houses. This becomes a huge burden during training, because loss from the houses becomes too insignificant compared to loss from background pixels, which causes the optimizer to be contented in situations such as "very few detections".

To address this issue, providing class weights into the Loss function is a good option under this circumstances (where we can't get more data). This is in fact forces the optimizer to find the weights that yields low loss from background pixels and low (despite amplified!) loss from house pixels.

I thought that weights 1 vs. 10 would work best theoretically, but 1 vs 6 and 1 vs 8 turned out to be better empirically.

Training & Test

This section provides a brief description about the training and test procedures.

Image patches

The requirements of the exercise clearly states that the images patches feed into the network must not be larger than 256x256 pixels. Moreover, I wanted to experiment with other image sizes as well.

To this end, data loader functions in util.py take W as argument to calculate how to divide the large sample image into WxW patches.

Assumption For convenience, image patches are always square, hence WxW and not WxH. Extending existing code to handle rectangular inputs is trivial, but I believe that is very unusual in the literature.

Image patches are created differently for Training and Test stages:

Please note that, both training and test patches were normalized with respect to a mean image computed from training set. (see images/mean.npy)

Hyperparameters and other choices

During searching for hyperparameters, the random seed was fixed. After long training hours, I found that following hyperparameters works the best for the given.

More experiments (although not presented nicely) can be found here.

Results

Results for the model #236, which is trained 200 epochs with the indicated hyperparams above.

PrecisionRecallF1 ScoreAccuracy
76.8093.5684.3597.55

Training Loss over iterations:

<div style="text-align:center"><img src ="https://github.com/emredog/FCNN-example/raw/master/plots/234_losses.png" /></div>

Scores over epochs:

<div style="text-align:center"><img src ="https://github.com/emredog/FCNN-example/raw/master/plots/234_acc.png" /></div>

Qualitative result: Predictions

References

  1. Long, J., Shelhamer, E., & Darrell, T. Fully convolutional networks for semantic segmentation. CVPR, 2015.
  2. He, Kaiming, et al. "Delving deep into rectifiers: Surpassing human-level performance on imagenet classification.", ICCV. 2015.