Home

Awesome

pytorch-clickhere-cnn

Introduction

This is a PyTorch implementation of Clickhere CNN and Render for CNN.

We currently provide the model, converted weights, dataset classes, and training/evaluation scripts. This implementation also includes an implementation of the Geometric Structure Aware loss function first mentioned in Render For CNN.

If you have any questions, please email me at mbanani@umich.edu.

Getting Started

Add the repository to your python path:

    export PYTHONPATH=$PYTHONPATH:$(pwd)

Please be aware that I make use of the following packages:

Generating the data

Download the Pascal 3D+ dataset (Release 1.1). Set the path for the Pascal3D directory in util/Paths.py. Finally, run the following command from the repository's root directory.

    cd data/
    python generate_pascal3d_csv.py

Please note that this will generate the csv files for 3 variants of the dataset: Pascal 3D+ (full), Pascal 3D+ (easy), and Pascal 3D-Vehicles (with keypoints). Those datasets are needed to obtain different sets of results. Alternatively, you could directly download the csv files by using data/get_csv.sh

Pre-trained Model Weights

We have converted the RenderForCNN and Clickhere model weights from the respective caffe models. The converted models are available for download by running the script in model_weights/get_weights.sh The converted models achieve comparable performance to the Caffe for the Render For CNN model, however, there is a larger error observed for Clickhere CNN. Updated results coming soon.

Running Inference

After downloading Pascal 3D+ and the pretrained weights, generating the CSV files, and setting the appropriate paths as mentioned above, you can run inference on the Pascal 3D+ dataset by running one of the following commands (depending on the model):

    python train.py  --model chcnn --dataset pascalVehKP  
    python train.py  --model r4cnn --dataset pascalEasy
    python train.py  --model r4cnn --dataset pascal

Results

To be updated soon!

The original Render For CNN paper reported the results on the 'easy' subset of Pascal3D, which removes any truncated and occluded images from the datasets. While Click-Here CNN reports results on an augmented version of the dataset where multiple instances may belong to the same object in the image as each image-keypoint pair corresponds to an instance. Below are the results Obtained from each of the runs above.

Render For CNN paper results:

We evaluate the converted model on Pascal3D-easy as reported in the original Render For CNN paper, as well as the full Pascal 3D dataset. It is worth nothing that the converted model actually exceeds the performance reported in Render For CNN.

#### Accuracy

atasetplanebikeboatbottlebuscarchaird.tablembikesofatraintvmean
Full76.2669.5859.0387.7484.3269.9774.266.7977.2982.3775.4881.9375.41
Easy80.3785.5962.9395.6094.1484.0882.7680.9585.3084.6184.0893.2684.47
Reported74835291918886737890869282

Median Error

datasetplanebikeboatbottlebuscarchaird.tablembikesofatraintvmean
Full11.5215.3319.338.515.549.3913.8312.8714.9013.038.9613.7212.24
Easy10.3211.6617.746.664.526.6511.219.7513.119.765.5211.939.90
Reported15.414.825.69.33.66.09.710.816.79.56.112.611.7

Pascal3D - Vehicles with Keypoints

We evaluated the converted Render For CNN and Click-Here CNN models on Pascal3D-Vehicle. It should be noted that the results for Click-Here are lower than those achieved by running the author provided Caffe code. It seems that there is something incorrect with the current reimplementation and/or weight conversion. We are working on fixing this problem.

Accuracy

buscarm.bikemean
Render For CNN89.2674.3681.9381.85
Click-Here CNN86.9183.2573.8381.33
Click-Here CNN (reported)96.890.285.290.7

Median Error

buscarm.bikemean
Render For CNN5.168.5313.469.05
Click-Here CNN4.018.1819.7110.63
Click-Here CNN (reported)2.634.9811.46.35

Pascal3D - Vehicles with Keypoints (Fine-tuned Models)

We fine-tuned both models on the Pascal 3D+ (Vehicles with Keypoints) dataset. Since we suspect that the problem with the replication of the Click-Here CNN model is in the attention section, we conducted an experiment where we only fine-tuned those weights. As reported below, fine-tuning just the attention model achieves the best performance.

buscarm.bikemean
Render For CNN FT93.5583.9887.3088.28
Click-Here CNN FT92.9789.8481.2588.02
Click-Here CNN FT-Attention94.4890.7784.9190.05
Click-Here CNN (reported)96.890.285.290.7

Median Error

buscarm.bikemean
Render For CNN FT3.045.8311.956.94
Click-Here CNN FT2.935.1413.427.16
Click-Here CNN FT-Attention2.885.2412.106.74
Click-Here CNN (reported)2.634.9811.46.35

Training the model

To train the model, simply run python train.py with parameter flags as indicated in train.py.

Citation

This is an implementation of Clickhere CNN and Render For CNN, so please cite the respective papers if you use this code in any published work.

Acknowledgements

We would like to thank Ryan Szeto, Hao Su, and Charles R. Qi for providing their code, and for their assistance with questions regarding reimplementing their work. We would also like to acknowledge Kenta Iwasaki for his advice with loss function implementation and Qi Fan for releasing caffe_to_torch_to_pytorch.

This work has been partially supported by DARPA W32P4Q-15-C-0070 (subcontract from SoarTech) and funds from the University of Michigan Mobility Transformation Center.