Awesome

Pairwise Confusion for Fine-Grained Visual Classification

This repository implements the PyTorch and Caffe code for the ECCV 2018 Paper "Pairwise Confusion for Fine-Grained Visual Classification". For more details on how to use pairwise confusion, please see the paper here. If you use our code for your research, please use the following BibTeX entry:

@InProceedings{Dubey_2018_ECCV,
author = {Dubey, Abhimanyu and Gupta, Otkrist and Guo, Pei and Raskar, Ramesh and Farrell, Ryan and Naik, Nikhil},
title = {Pairwise Confusion for Fine-Grained Visual Classification},
booktitle = {The European Conference on Computer Vision (ECCV)},
month = {September},
year = {2018}
}

PyTorch

Simply include confusion_pytorch using import confusion_pytorch, and add the requisite loss. Check source at confusion_pytorch/__init__.py for details.

Caffe

Prerequisites

The following dependencies must be satisfied for proper functionality:

Caffe at CAFFE_ROOT. Note: for feature extraction, you must use the version of Caffe available here (this version is exactly updated with the BVLC Caffe, with a couple of edits to HDF5 Output Layers, and is straightforward to compile).
Python dependencies include matplotlib, h5py, lmdb, tqdm. Install these by running:

for req in $(cat requirements.txt); do pip install $req; done

Environment variable CAFFE_ROOT must be set to the path where Caffe resides. This can be done by -

export CAFFE_ROOT=/path/to/your/caffe/root

Please note that these interfaces support only LMDB and ImageData layers at the time. To view help on a specific task, choose the related interface:

Training
Feature Extraction
t-SNE Visualization

Training Networks

The script train.py provides a very easy and straightforward interface to train neural networks using Caffe.

Architecture Requirements in Prototxt Files

Note that, before you train the network, the following layers constraints are satisfied in model and solver prototxts:

Data layer called data.
Label layer called label.
There are no spaces in the model name, it is one word.
Training performance metric (any monotonically increasing metric with performance would be fine, most commonly it's top-1 accuracy) called accuracy_train in the TRAIN phase, and duplicate layer accuracy_test for the TEST phase.
Training optimization metric (any monotonically decreasing metric with performance would be fine, most commonly it's Cross-Entropy) called loss_train in the TRAIN phase, and duplicate layer loss_test for the TEST phase.
Similar metric as Accuracy/Top-5 called top5_train for TRAIN phase, and top5_test for TEST phase.
batch_size in the TRAIN phase should be even (for confusion loss).

Please take a look at the models present in caffe/models/ for some example prototxts.

Basic Usage

For a fixed solver/model combination, the simplest command is:

./train.py --model /path/to/model \
--solver /path/to/solver \
--log_directory /path/to/log/directory \
--snapshot /path/to/snapshot \
--gpu 0

This command will run Caffe training while storing snapshots in the specified directory, with the specified model (overriding the settings in the prototxt file), on GPU 0. It is preferred that absolute paths be used to avoid bugs.

Logging

In the logging directory, you will find the following files:

log_directory /
		model.prototxt
		solver.prototxt
		train_perf.csv
		test_perf.csv
		plot.pdf
		log.txt

The CSV files report the variation of Top-1, Top-5 and Cross-Entropy for both training and test sets. The log.txt is the complete output of [INFO, ERROR] from Caffe. The plot.pdf is a collection of two plots - variation of train/test performance v/s iterations, and loss function v/s iterations. These plots are updated as the training progresses.

The model.prototxt and solver.prototxt files are generated by the code and are the final prototxts that are used during runtime.

Other Usage

The command ./train.py --help will provide additional help. Some examples are:

To Override Supplied Parameters in the Solver

You can replace parameters supplied in the solver during runtime. To do that, pass the new parameters as an string dictionary to the --params argument. For example, if we wished to change the base_lr to 0.001 and momentum to 0.85, we can specify the command as:

./train.py --model /path/to/model \
--solver /path/to/solver \
--log_directory /path/to/log/directory \
--snapshot /path/to/snapshot \
--gpu 0 \
--params '{"base_lr" : 0.001, "momentum": 0.85}'

To Specify the Data Location

As above, the location for training data, validation data and the mean file can also be specified during runtime. Example:

./train.py --model /path/to/model \
--solver /path/to/solver \
--log_directory /path/to/log/directory \
--snapshot /path/to/snapshot \
--gpu 0 \
--data '{"train" : "/path/to/training/data", "test": "/path/to/testing/data", "mean_file": "/path/to/mean_file"}'

Any combination of these can be used, however all supplied paths must exist and be absolute.

To Specify Confusion Loss

Confusion Loss can be added on execution in a manner similar to supplying parameters, by feeding in a string dictionary. For example, to apply confusion loss to layer "l1" with weight 0.001 and "l2" with weight 0.05, the command:

./train.py --model /path/to/model \
--solver /path/to/solver \
--log_directory /path/to/log/directory \
--snapshot /path/to/snapshot \
--gpu 0 \
--confusion '{"l1" : 1, "l2": 0.005}'

Please make sure to adjust the loss weights accordingly. The optimal range of operation for pairwise loss is 0.1N to 0.2N (where N is the number of classes).

Feature Extraction in Caffe

Caffe's Python interface is pretty slow at computing features, and the interface feature_extractor.py uses the fast 'test' phase of the Caffe executable along with HDF5 layers to extract features quickly (multiple layers are produced in the same forward pass), producing CSV files for each layer. Note, however, that this supports only ImageData and LMDB Data layers, with data layer named data and label layer named label.

Basic Usage

The basic input requires the model definition, weights, required layers (as a string list), input file (or directory in case of LMDB) and output prefix. Example:

./feature_extractor.py --model /path/to/model/prototxt \
--weights /path/to/caffemodel \
--output /path/to/output/prefix \
--layers '["layer1", "layer2"]' \
--input /input/imagelist/or/lmdb/file

Here, the input has to be of the same type as the model definition - a path to LMDB database if Data layer with backend LMDB is used, and ImageData list if ImageData layer is used. If you have a folder of images you want to extract features from, you can run the following command inside the target folder to generate the list:

ls | xargs -I @ sh -c 'echo $(pwd)/@ 0' > /path/to/imagedata/list

This command will generate the required imagedata list (with dummy 0 labels) at /path/to/imagedata/list. Make sure the data layer is ImageData for such an approach.

Advanced Options

As with the Training interface, you can select GPUs, custom batch_sizes (to override model definition batch_size) and the option to write headers in the CSV file (off by default).

t-SNE Visualization

The script at tsne.py provides an easy interface to generate t-SNE plots.