Awesome

Deep Descriptors

This repository contains the code release for our 2015 ICCV paper. If you do use it, please cite:

Discriminative Learning of Deep Convolutional Feature Point Descriptors
Edgar Simo-Serra, Eduard Trulls, Luis Ferraz, Iasonas Kokkinos, Pascal Fua, and Francesc Moreno-Noguer
International Conference on Computer Vision (ICCV), 2015

The code is based on the Torch7 framework.

Overview

We learn compact discriminative feature point descriptors using a convolutional neural network. We directly optimize for using L2 distance by training with a pair of corresponding and non-corresponding patches correspond to small and large distances respectively using a Siamese architecture. We deal with the large number of potential pairs with the combination of a stochastic sampling of the training set and an aggressive mining strategy biased towards patches that are hard to classify. The resulting descriptor is 128 dimensions that can be used as a drop-in replacement for any task involving SIFT. We show that this descriptor generalizes well to various datasets.

See the website for more detailed information information.

License

Copyright (C) <2016> <Edgar Simo-Serra, Eduard Trulls>

This work is licensed under the Creative Commons
Attribution-NonCommercial-ShareAlike 4.0 International License. To view a copy
of this license, visit http://creativecommons.org/licenses/by-nc-sa/4.0/ or
send a letter to Creative Commons, PO Box 1866, Mountain View, CA 94042, USA.

Edgar Simo-Serra, Waseda University, February 2016.
esimo@aoni.waseda.jp, http://hi.cs.waseda.ac.jp/~esimo/
Eduard Trulls, EPFL, February 2016.
eduard.trulls@epfl.ch, http://cvlabwww.epfl.ch/~trulls/

Models

Four different models are made avaiable. Best iteration is chosen with a validation subset. Model and training procedure is the same for all models, only the training data varies. If not sure what model to use, use models/CNN3_p8_n8_split4_073000.t7.

models/CNN3_p8_n8_split1_072000.t7: Trained on Liberty and Yosemite.
models/CNN3_p8_n8_split2_104000.t7: Trained on Liberty and Notre Dame.
models/CNN3_p8_n8_split3_067000.t7: Trained on Yosemite and Notre Dame.
models/CNN3_p8_n8_split4_073000.t7: Trained on a subset of Liberty, Yosemite, and Notre Dame.

Usage

Torch

See example.lua for the full example file.

Load a model:

model = torch.load( 'models/CNN3_p8_n8_split4_073000.t7' )

Normalize the patches, which should be a Nx1x64x64 4D float tensor with a range of 0-255:

for i=1,patches:size(1) do
   patches[i] = patches[i]:add( -model.mean ):cdiv( model.std )
end

Compute the 128-float descriptors for all the N patches:

descriptors = model.desc:forward( patches )

Note the output will be a Nx128 2D float tensor where each row is a descriptor.

Matlab

It is possible to use Matlab by calling torch. This also requires the mattorch package to work. Please look at the files in matlab/. In particular, by calling matlab/desc.lua from Matlab, batches of descriptors can be processed. This is done by using the code in matlab/example.m:

patches = randn( 64, 64, 1, 2 );

save( 'patches.mat', 'patches' );
system( 'th desc.lua' );
desc = load( 'desc.mat' );

desc.x

As the Matlab matrix ordering is the opposite of Torch, please use the 64x64x1xN inputs with values in the 0-255 range. Please note that this creates temporary files patches.mat and desc.mat each time it is called. You can also specify which model to use with:

system( 'th desc.lua --model ../models/CNN3_p8_n8_split4_073000.t7' )

As this has a fair amount of overhead, use large batches to get best performance.

Citing

If you use this code please cite:

@InProceedings{SimoSerraICCV2015,
   author    = {Edgar Simo-Serra and Eduard Trulls and Luis Ferraz and Iasonas Kokkinos and Pascal Fua and Francesc Moreno-Noguer},
   title     = {{Discriminative Learning of Deep Convolutional Feature Point Descriptors}},
   booktitle = "Proceedings of the International Conference on Computer Vision (ICCV)",
   year      = 2015,
}

Notes

Models are trained from scratch and not the models used in the paper as there was an incompatibility with newer torch versions. Results should be comparable in all cases.