Home

Awesome

TI-pooling

This repository contains TensorFlow and Torch7 implementations of TI-pooling (transformation-invariant pooling) from the following paper:

@inproceedings{laptev2016ti,
  title={TI-POOLING: transformation-invariant pooling for feature learning in Convolutional Neural Networks},
  author={Laptev, Dmitry and Savinov, Nikolay and Buhmann, Joachim M and Pollefeys, Marc},
  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
  pages={289--297},
  year={2016}
}

Update February 2017. TensorFlow implementation is ready! You can independently use either Torch7 or TensorFlow implementations or both: the code is structured very similarly. Scroll to "Instructions for Linux" for the details.

The original paper provides experimental evaluation on three datasets. This repository contains source codes for one of these experiments: mnist-rot dataset, consisting of 12k training images of randomly rotated digits.

What is TI-pooling?

TI-pooling is a simple technique that allows to make a Convolutional Neural Networks (CNN) transformation-invariant. I.e. given a set of nuisance transformations (such as rotations, scale, shifts, illumination changes, etc.), TI-pooling guarantees that the output of the network will not to depend on whether the input image was transformed or not.

Why TI-pooling and not data augmentation?

Comparing to the very commonly used data augmentation, TI-pooling finds canonical orientation of input samples, and learns mostly from these samples. It means that the network does not have to learn different paths (features) for different transformations of the same object. This results in the following effects:

How does TI-pooling work?

TI-pooling pipeline

Any caveats?

One needs to be really sure to introduce transformation-invariance: in some real-world problems some transformation can seem like an nuisance factor, but can be in fact useful. E.g. rotation-invariance does not work well for natural objects, because most natural objects have a "default" orientation, which helps us to recognize them (an upside-down animal is usually harder to recognize, not only for a CNN, but also for a human being). Same rotation-invariance proved to be very useful for cell recognition, where orientation is essentially random.

Also, while training time is comparable and usually faster than with data augmentation, the testing time increases linearly with the number of transformations.

Instructions for Linux

First run ./setup.sh to download the dataset, it will be stored in the root directory. Then, depending on the framework you want to use, navigate to the corresponding directory and start training by calling rot_mnist12K file.

The code was tested for the following configuration: