Awesome
Visiolinguistic-Attention-Learning
Tensorflow code of VAL model
Chen et al. Image Search with Text Feedback by Visiolinguistic Attention Learning. CVPR2020
Getting Started
Prerequisites:
- Datasets: Fashion200k [1], FashionIQ [2], Shoes [3,4].
- Python 3.6.8
- Tensorflow 1.10.0
Preparation:
(1) Download ImageNet pretrained models: mobilenet and
resnet, which should be put under the directory pretrain_model
.
(2) Follow steps in scripts/prepare_data.sh
to prepare datasets. Note: fashion200k
and shoes
can be downloaded manually. Relevant py
files for data preparation are detailed below.
download_fashion_iq.py
: crawl the image data from Amazon websites. Note that some url links might be broken.generate_groundtruth.py
: generate some.npy
files that charaterize the groundtruth annotations during test time.read_glove.py
: prepare the pre-trainedglove
word embeddings to initialize the text model (i.e. LSTM).
Running Experiments
Training & Testing:
Train and test the VAL model on different datasets in one script file as follows.
<!-- On `fashion200k`, run -->bash scripts/run_fashion200k.sh
<!-- On `fashion_iq`, run -->
bash scripts/run_fashion_iq.sh
<!-- On `shoes`, run -->
bash scripts/run_shoes.sh
The test results will be finally reported in results/results_fashion_iq.log
.
Our implementation include the following .py
files. Note that fashion200k
is formated differently compared to fashion_iq
or shoes
, as a triplet of source image, text and target image is not pre-given, but is instead sampled randomly during training. Therefore, there are two implementation to build and run the training graph.
train_val.py
: build and run the training graph on datasetfashion_iq
orshoes
.train_val_fashion200k.py
: build and run the training graph on datasetfashion200k
.model.py
: define the model and losses.config.py
: define image preprocessing and other configurations.extract_features_val.py
: extract features from the model.test_val.py
: compute distance, perform retrieval, and report results in thelog
file.
Bibtex:
@inproceedings{chen2020image,
title={Image Search with Text Feedback by Visiolinguistic Attention Learning},
author={Chen, Yanbei and Gong, Shaogang and Bazzani, Loris},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={3001--3011},
year={2020}
}
License
This project is licensed under the Apache-2.0 License - see the LICENSE file for details.
References
[1] Automatic Spatially-aware Fashion Concept Discovery, ICCV2019 <br /> [2] The Fashion IQ Dataset: Retrieving Images by Combining Side Information and Relative Natural Language Feedback, CVPRW2019 <br /> [3] Dialog-based interactive image retrieval, NeuRIPS2018 <br /> [4] Auomatic attribute discovery and characterization from noisy web data, ECCV10 <br />