Home

Awesome

On Translation Invariance in CNNs: Convolutional Layers can Exploit Absolute Spatial Location

In CVPR, 2020. Osman Semih Kayhan and Jan van Gemert.

You can find the core of the paper and the detailed explanation of the experiments in our blog.

This repository contains the experiments of the paper.<br>

Table of contents<br>

  1. How far from the image boundary can absolute location be exploited?<br>
  2. Border handling variants, with Red and Green experiments<br>
  3. Sensitivity to image shifts for image classification<br>
  4. Data efficiency with Imagenet Experiment and Patch Matching Experiment<br>
  5. Small datasets with Action Recognition <br>

Getting started

For clarity, we put each experiment under a specific folder.<br>

PyTorch

We used pytorch for all the experiments in the paper.

1. How far from the image boundary can absolute location be exploited?

We have different demostrations of the fact:

Can fully convolutional net (FCN) predict location?

We use a simple patch which is placed on the top-left and bottom-right of the image. We train 5x5 filter with zero padded same convolution followed by ReLU, global max pooling and softmax classifier by using SGD optimizer.

<img src="images/chess.png" align="center" width="200" title="Rook">

The answer is YES! FCN can predict the location. You can try it by using the notebook.

Can imagenet pretrained FCN predict location?

We demonstrate the fact by using Imagenet pretrained weights, training from stratch with randomly initiliazed weights and using randomly initiliazed frozen weights on BagNet-33, Resnet-18 and DenseNet-121 methods. You can see the results below:<br>

<img src="images/4QI_results.png" align="center" width="600" title="4QI Results">

The experiment folder is here.

2. Border handling variants

We evaluate convolution types in terms of border handling. We generate the Red-Green two class classification dataset (image below) for evaluating exploitation of absolute position. The upper row of images is class 1: Red-to-the-left-of-Green. The lower row of images is class 2: Green-to-the-left-of-Red. The Similar Testset is matching the Train-Val set in absolute location: Class 1 at the top and class 2 at the bottom. The Dissimilar testset is an exact copy of the Similar testset where absolute location is swapped between classes: Class 1 at the bottom, Class 2 at the top. If absolute location plays no role then classification on the Similar Testset would perform equal to the Dissimilar Testset. For each convolution type, the network has 4 convolution layers followed by global max pooling and softmax classifier.

<img src="images/redgreen.png" align="center" width="500" title="Red and Green">

We show below the results of different convolution types on Red and Green dataset:<br>

TypePadSimilar TestDissimilar Test
V-Conv-100.0±0.00.2±0.1
S-ConvZero99.8±0.18.4±0.7
S-ConvCir73.7±1.073.7±1.0
F-ConvZero89.7±0.589.7±0.5

Valid and same-zero exploit location and do poorly on the Dissimilar test set. Same-circ is translation invariant yet invents disturbing new content. Full-zero is translation invariant, doing well on both test sets.

The experiment folder is here.

3. Sensitivity to image shifts

By using Imagenet-A dataset, we compare 4 different methods in terms of diagonal shifting and consistency on 4 ResNet architectures.<br>

Diagonal ShiftS-ConvF-ConvS+BlurPoolF+BlurPool
RN1879.4382.7481.9683.95
RN3482.0685.6683.7386.91
RN5086.3687.9287.5088.93
RN10186.9587.7888.2288.73
ConsistencyS-ConvF-ConvS+BlurPoolF+BlurPool
RN1886.4388.3888.3290.03
RN3487.6290.1289.2191.53
RN5090.2191.3691.6892.75
RN10190.7691.7192.3692.86

The experiment folder is here.

4. Data efficiency

<img src="images/data_eff.png" align="center" width="600" title="Data efficiency">

Image classification

The experiment is done by using Imagenet-2012 dataset with Resnet-50 architecture.

The experiment folder is here.

Patch matching

The experiment is done on Brown dataset by using Hardnet.

The experiment folder is here.

5. Small datasets

Action recognition

UCF101S-ConvF-ConvF-Conv - Temporal
RN-1838.640.642.2
RN-3437.046.9-
RN-5036.244.1-
HMDB51S-ConvF-Conv
RN-1816.119.3
RN-3415.218.3
RN-5014.319.0
<img src="images/action_training_curves.png" align="center" width="700" title="Training curves">

The experiment folder is here.