Home

Awesome

Bidirectional Retrieval Made Simple

Code for our CVPR"18 paper Bidirectional Retrieval Made Simple. Given that the original code from our work cannot be publicly shared, we adapted the code from VSE++ in order to provide a public version.

Overview:

  1. Summary
  2. Results
  3. Getting started
  4. Train new models
  5. Evaluate models
  6. Citation
  7. License

<a name="summary"></a>Summary

Code for training and evaluating our novel CHAIN-VSE models for efficient multimodal retrieval (image annotation and caption retrieval). In summary, CHAIN-VSE uses convolutional layers directly over character-level inputs fully replacing the use of RNNs and word-embeddings. Despite being lighter and conceptually much simpler, those models achieve state-of-the-art results in MS COCO and in some text classification datasets.

<img src="https://raw.githubusercontent.com/jwehrmann/chain-vse/master/figures/chain.png" alt="chain" width="250px"/> <img src="https://raw.githubusercontent.com/jwehrmann/chain-vse/master/figures/inputnoise.jpeg" alt="noise" width="300px"/><img src="https://raw.githubusercontent.com/jwehrmann/chain-vse/master/figures/params.jpeg" alt="param" width="300px"/>

Highlights

<a name="results"></a> Bidirectional Retrival Results

Results achieved using this repository (COCO-1k test set) using pre-computed features (note that we do not finetune the network in this experiment):

MethodFeaturesR@1R@10R@1R@10
RFF-net [baseline@ICCV"17]ResNet15256.4091.5043.9088.60
chain-v1 (p=1, d=1024)resnet152_precomp57.8095.6044.1890.66
chain-v1 (p=1, d=2048)resnet152_precomp59.9094.8045.0890.54
chain-v1 (p=1, d=8192)resnet152_precomp61.2095.8046.6090.92

<a name="start"></a> Getting Started

For getting started you will need to setup your environment and download the required data.

<a name="depend"></a> Dependencies

We recommended to use Anaconda for the following packages.

import nltk
nltk.download()
> d punkt

<a name="data"></a> Download data

Pre-computed features:

wget http://lsa.pucrs.br/jonatas/seam-data/irv2_precomp.tar.gz
wget http://lsa.pucrs.br/jonatas/seam-data/resnet152_precomp.tar.gz
wget http://lsa.pucrs.br/jonatas/seam-data/vocab.tar.gz

<a name="train"></a> Training new models

Run train.py:

To train CHAIN-VSE (p=1, d=2048) using resnet152_precomp features, run:

python train.py \
--data_path "$DATA_PATH" \
--data_name resnet152_precomp \
--logger_name runs/chain-v1/resnet152_precomp/  \
--text_encoder chain-v1 \
--embed_size 2048 \
--vocab_path char

<a name="evaluate"></a> Evaluate pre-trained models

from vocab import Vocabulary
import evaluation
evaluation.evalrank("$RUN_PATH/model_best.pth.tar", data_path="$DATA_PATH", split="test")'

To evaluate in COCO-1cv test set, pass fold5=True with a model trained using --data_name coco.

<a name="citation"></a>Citation

If you found this code/paper useful, please cite the following papers:

@InProceedings{wehrmann2018cvpr,
author = {Wehrmann, Jônatas and Barros, Rodrigo C.},
title = {Bidirectional Retrieval Made Simple},
booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2018}
}

@article{faghri2017vse++,
  title={VSE++: Improving Visual-Semantic Embeddings with Hard Negatives},
  author={Faghri, Fartash and Fleet, David J and Kiros, Jamie Ryan and Fidler, Sanja},
  journal={arXiv preprint arXiv:1707.05612},
  year={2017}
}

<a name="license"></a> License

Apache License 2.0