Home

Awesome

SketchyDatabase

LICENSE

This project is a repo of The Sketchy Database: Learning to Retrieve Badly Drawn Bunnies.

The homepage of the original project.

Get the dataset via Google Drive sketchydataset SketchyDataset Intro

DataSet

Sketchy Database

Test Set

As I didn't notice that the Sketchy Database contained a list of the testing photos, I randomly chose the testing photos and their related sketches myself. The test data set are listed in TEST_IMG and TEST_SKETCH

categoryphotosketch
airplane1075
alarm_clock52
ant53
..
..
..
window54
wine_bottle52
zebra66
Total12507875

The Dataset Structure in My Project

Dataset
  ├── photo-train               # the training set of photos
  ├── sketch-triplet-train      # the training set of sketches
  ├── photo-test                # the testing set of photos
  ├── sketch-triplet-test       # the testing set of sketches

Test

using feature_extract.py to get the extracted feature files ('*.pkl')

using retrieval_test.py to get the testing result.

Testing Result

There is no GoogLeNet, which resulted the best in the original paper, implement in PyTorch, so I used vgg16 instead.

modelepochrecall@1recall@5
resnet34(pretrained;mixed training set;metric='cosine')
908.51%18.68%
1509.31%20.44%
resnet34(pretrained;mixed training set;metric='euclidean')
906.45%14.79%
1506.96%16.46%
resnet34(150 epoch;triplet loss m=0.02;metric='euclidean';lr=1e-5 batch_size=16)
859.87%22.37%
vgg16(pretrained;triplet loss m=0.3;metric='euclidean';lr=1e-5;batch_size=16)
00.17%0.72%
517.59%45.51%
19031.03%67.86%
27532.22%68.48%
97535.24%71.53%
vgg16(fine-tune(275epoch);m=0.15;metric='euclidean';lr=1e-7;batch_size=16)
5533.22%70.04%
62535.78%72.44%
99536.09%73.02%
resnet50(pretrained; triplet loss m=0.15; metric='euclidean'; lr=1e-7;batch_size=16)
00.71%11.48%
5510.18%29.94%
94015.17%47.61%
resnet50(pretrained; triplet loss m=0.1; metric='euclidean'; lr=1e-6 batch_size=32)
31519.58%57.19%
resnet50(pretrained; triplet loss m=0.3; metric='euclidean'; lr=1e-5 batch_size=48)
2021.56%57.50%
9530.32%71.73%
<span id="resnet"></span>26540.08%78.83%
93046.04%83.30%

I have no idea about why the resnet34 got that bad result, while the vgg16 and resnet50 resulted pretty well.

Retrieval Result

I randomly chose 20 sketches as the query sketch and here is the retrieval result. The model I used is the resnet50(pretrained; triplet loss m=0.3; metric='euclidean'; lr=1e-5 batch_size=48) after 265 training epoch.

retrieval_result

Feature Visulization via T-SNE

all the visulizated categories are the first ten categories in alphabetical order.

The boxes represent the photos, while the points represent the sketches.

modelvis
resnet34 pretrained on ImageNet
pretrained; sketch branch& photo branch are trained sparately
resnet34
pretrained; mixed training set
resnet34 after 90 training epoch
resnet34 after 150 training epoch
pretrained; triplet loss m=0.3 lr=1e-5
vgg16 after 0 training epoch
vgg16 after 5 training epoch
vgg16 after 190 training epoch
fine tune; triplet loss m=0.15 lr=1e-7
vgg16(fine tune) after 995 training epoch
pretrained; triplet loss m=0.15 lr=1e-7
resnet50 after 0 training epoch
resnet50 after 940 training epoch
pretrained; triplet loss m=0.1 lr=1e-6
resnet50 after 315 training epoch
pretrained; triplet loss m=0.3 lr=1e-5 batch_size=48
resnet50 after 265 training epoch