Home

Awesome

Zero-shot GCN

This code is a re-implementation of the zero-shot classification in ImageNet in the paper Zero-shot Recognition via Semantic Embeddings and Knowledge Graphs. The code is developed based on the TensorFlow framework and the Graph Convolutional Network (GCN) repo.

Our pipeline consists of two parts: CNN and GCN.

<!--- The pipeline is as the figure above depicts. It consists of two network: CNN and the Graph Convolutional Networ (GCN) module. Our GCN model takes word embeddings for each object node as inputs and outputs the visual classifier for each object node. We take CNN as off-the-shelf network (ImageNet-1k pre-trained specifically) to extract image feature and provide its final FC classifiers as ground truths for the GCN outputs during training. After training with the visual classifiers of 1000 seen classes, we can generate the classifiers of all unseen classes. These classifiers can be directly on the extracted image features. --->

Citation

If you use our code in your research or wish to refer to the benchmark results, please use the following BibTeX entry.

@article{wang2018zero,
  title={Zero-shot Recognition via Semantic Embeddings and Knowledge Graphs},
  author={Wang, Xiaolong and Ye, Yufei and Gupta, Abhinav},
  journal={CVPR},
  year={2018}
}

Using Our Code

git clone git@github.com:JudyYe/zero-shot-gcn.git
cd zero-shot-gcn/src

Without further specification, we default the root directory to zero-shot-gcn/src.

Dataset Preparation

Please read DATASET.md for downloading images and extracting image features.

Testing Demo

With extracted feature and semantic embeddings, at this point, we can perform zero-shot classification with the model we provide.

wget -O ../data/wordnet_resnet_glove_feat_2048_1024_512_300 https://www.dropbox.com/s/e7jg00nx0h2gbte/wordnet_resnet_glove_feat_2048_1024_512_300?dl=0
python test_imagenet.py --model ../data/wordnet_resnet_glove_feat_2048_1024_512_300

The above line defaults to res50 + 2-hops combination and test under two settings: unseen classes with or without seen classes. (see the paper for further explaination.)

We also provide other configurations. Please refer to the code for details.

Main Results

We report the results with the above testing demo code (using ResNet-50 visual features and GloVe word embeddings). All experiments are conducted with the ImageNet dataset.

We first report the results on testing with only unseen classes. We compare our method with the state-of-the-art method SYNC in this benchmark.

<center>
ImageNet SubsetMethodtop 1top 2top 5top 10top 20
2-hopsSYNC <br/> GCNZ (Ours)10.5 <br/> 21.017.7<br/>33.728.6 <br/> 52.740.1<br/> 64.852.0 <br/> 74.3
3-hopsSYNC <br/>GCNZ (Ours)2.9 <br/> 4.34.9<br/> 7.79.2 <br/> 14.214.2 <br/> 20.420.9 <br/> 27.6
AllSYNC <br/>GCNZ (Ours)1.4 <br/> 1.92.4 <br/> 3.44.5 <br/> 6.47.1 <br/> 9.310.9 <br/> 12.7
</center>

We then report the results under the generalized zero-shot setting, i.e. testing with both unseen and seen classes. We compare our method with the state-of-the-art method ConSE in this benchmark.

<center>
ImageNet SubsetMethodtop 1top 2top 5top 10top 20
2-hops (+1K)ConSE <br/> GCNZ (Ours)0.1 <br/> 10.211.2<br/> 21.224.3 <br/> 42.129.1<br/> 56.232.7 <br/> 67.5
3-hops (+1K)ConSE <br/>GCNZ (Ours)0.2 <br/> 2.43.2<br/> 5.37.3 <br/> 12.010.0 <br/> 18.212.2 <br/> 25.4
All (+1K)ConSE <br/>GCNZ (Ours)0.1 <br/> 1.11.5 <br/> 2.43.5 <br/> 5.44.9 <br/> 8.36.2 <br/> 11.7
</center>

We also visualize the t-SNE plots of GCN inputs and outputs for two subtrees of WordNet as followings.

synset wordt-SNE of input word embeddingst-SNE of output visual classifiers
instrumentality<br/> instrumentation
animal,<br/> animate being,<br/> beast, brute, <br/>creature, fauna

Training

As DATASET.md illustrates, convert_to_gcn_data.py prepares data to train GCN. It supports two CNN network fc = res50 or inception, and three semantic embedding wv = glove or google or fasttext. The output will be saved to ../data/$wv_$fc/

python convert_to_gcn_data.py --fc res50 --wv glove

After preparing the data, we can start training by using:

python gcn/train_gcn.py --gpu $GPU_ID 	--dataset ../data/glove_res50/ --save_path $SAVE_PATH