Awesome

Camera Pose Auto-Encoders (PAEs)

Overview

This repository provides an official implementation for:

Camera Pose Auto-Encoders (PAEs), accepted to ECCV 2022
Iterative Absolute Pose Regression (iAPR), which extends our ECCV22 work, a new class of APRs, combining absolute pose regression and relative pose regression, without extra image or pose storage.

Introduction

Camera Pose Auto-Encoders (PAEs) are multi-layer perceptrons (MLPs), trained via a Teacher-Student approach to encode camera poses, using Absolute Pose Regressors (APRs) as their teachers (Fig. 1). Once trained, PAEs can closely reproduce their teachers' performance, across outdoor and indoor environments and when learning from multi- and single- scene APR teachers with different architectures.

<img src="figs/training_paes.png" width="400" height="200"> Fig. 1: Training PAEs

Below we provide instructions for running our code in order to train teacher APRs and student PAEs and for evaluating them. We also provide pre-trained models.

Once a PAE is trained, we can use it to as a means for extending pose regression with visual and spatial information at a minimal cost.

Iterative Absolute Pose Regression (iAPR) is a new class of APRs, which combines absolute pose regression and relative pose regression, without additional image or pose storage. Specifically, it applies a PAE-based RPR on the initial APR estimate for one or more iterations (Fig. 2). iAPR achieves a new state-of-the-art (SOTA) localization accuracy for APRs on the 7Scenes dataset, even when trained with only 30% of the data.

<img src="figs/iapr.png" width="400" height="100"> Fig. 2: Our proposed iAPR method, combining absolute pose regression with PAE-based relative pose regression.

As for PAEs, we provide instructions for training and testing our iAPR model.

Prerequisites

In order to run this repository you will need:

Python3 (tested with Python 3.7.7)
PyTorch deep learning framework (tested with version 1.0.0)
Download the Cambridge Landmarks dataset and the 7Scenes dataset
You can also download pre-trained models to reproduce reported results (see below)
For a quick set up you can run: pip install -r requirments.txt Note: All experiments reported in our paper were performed with an 8GB 1080 NVIDIA GeForce GTX GPU

PAEs

Training and Testing Teacher APRs

Our code allows training and testing of single-scene and multi-scene APR teachers. Specifically, we use PoseNet with different CNN backbones as our single-scene APRs and MS-Transformer as our multi-scene APR.

For example, in order to train PoseNet with EfficientNet-B0 on the KingsCollege scene, run:

python main_train_test_apr.py posenet train models/backbones/efficient-net-b0.pth
<path to the CambridgeLandmarks dataset>
datasets/CambridgeLandmarks/abs_cambridge_pose_sorted.csv_KingsCollege_train.csv
CambridgeLandmarks_config.json

In order to train with a different backbone, change the path to the backbone (third argument) and the value of 'backbone_type', under the 'posenet' dictionary, in the json configuraion file. We support MobileNet and ResNet50.

After training, you can test your trained model by running:

python main_train_test_apr.py posenet test models/backbones/efficient-net-b0.pth <path to the CambridgeLandmarks dataset> datasets/CambridgeLandmarks/abs_cambridge_pose_sorted.csv_KingsCollege_test.csv
CambridgeLandmarks_config.json --checkpoint_path posenet_effnet_apr_kings_college.pth

In order to train and test MS-Transformer, please follow the instructions at our MS-Transformer repository

Training and Testing Student PAEs

Single-scene PAEs

To train a single-scene PAE with a PoseNet Teacher (with an EfficientNet-B0 backbone), run the following command, using the same configuration used for the teacher APR:

python main_learn_pose_encoding.py posenet train models/backbones/efficient-net-b0.pth
<path to dataset> datasets/CambridgeLandmarks/abs_cambridge_pose_sorted.csv_KingsCollege_train.csv
CambridgeLandmarks_config.json posenet_effnet_apr_kings_college.pth

You can then evaluate it and compare it to its teacher, by running:

python main_learn_pose_encoding.py posenet test models/backbones/efficient-net-b0.pth
<path to dataset> datasets/CambridgeLandmarks/abs_cambridge_pose_sorted.csv_KingsCollege_test.csv CambridgeLandmarks_config.json posenet_effnet_apr_kings_college.pth
--encoder_checkpoint_path posenet_effnet_apr_kings_college.pth

Multi-scene PAEs

Similarly, you can train a multi-scene PAE with MS-Transformer. For example, training on the 7Scenes dataset:

python main_learn_multiscene_pose_encoding.py
ems-transposenet
train
models/backbones/efficient-net-b0.pth
<path to dataset>
datasets/7Scenes/7scenes_all_scenes.csv
7scenes_config.json 
ems_transposenet_7scenes_pretrained.pth

and then evaluate it by running:

python main_learn_multiscene_pose_encoding.py
ems-transposenet
test
/models/backbones/efficient-net-b0.pth
<path to dataset>
datasets/7Scenes/abs_7scenes_pose.csv_fire_test.csv
7scenes_config.json
ems_transposenet_7scenes_pretrained.pth
--encoder_checkpoint_path 
mstransformer_7scenes_pose_encoder.pth

Pre-trained Models

Model (Linked)	Description
APR models
PoseNet+MobileNet	Single-scene APR, KingsCollege scene
PoseNet+ResNet50	Single-scene APR, KingsCollege scene
PoseNet+EfficientB0	Single-scene APR, KingsCollege scene
MS-Transformer	Multi-scene APR, CambridgeLandmarks dataset
MS-Transformer	Multi-scene APR, 7Scenes dataset
Camera Pose Auto-Encoders
Auto-Encoder for PoseNet+MobileNet	Auto-Encoder for a single-scene APR, KingsCollege scene
Auto-Encoder for PoseNet+ResNet50	Auto-Encoder for a single-scene APR, KingsCollege scene
Auto-Encoder for PoseNet+EfficientB0	Auto-Encoder for a single-scene APR, KingsCollege scene
Auto-Encoder for MS-Transformer	Auto-Encoder for a multi-scene APR, CambridgeLandmarks dataset
Auto-Encoder for MS-Transformer	Auto-Encoder for a multi-scene APR, 7Scenes dataset

iAPR

Training and Testing

We propose a PAE-based RPR model (Fig. 3) to estimate the relative motion between an encoded pose and a query image.

<img src="figs/pae_rpr.png" width="250" height="400"> Fig. 3: Our proposed PAE-based RPR architecture, for implementing iAPR.

In order to train our model, run:

python main_iapr.py train <path to dataset> 7scenes_training_pairs.csv
7scenes_iapr_config.json pretrained_models/ems_transposenet_7scenes_pretrained.pth
models/backbones/efficient-net-b0.pth pretrained_models/mstransformer_7scenes_pose_encoder.pth

Note: links to data are available below.

In order to test iAPR with our model, for example with the chess scene, run:

python main_iapr.py test <path to dataset> datasets/7Scenes/abs_7scenes_pose.csv_chess_test.csv
7scenes_iapr_config.json pretrained_models/ems_transposenet_7scenes_pretrained.pth
models/backbones/efficient-net-b0.pth pretrained_models/mstransformer_7scenes_pose_encoder.pth 
--checkpoint_path <path to iapr model>

You can change the number of iterations in the configuration file. Pretrained models are available below.

our iAPR achieves SOTA performance on the 7Scenes dataset and improves performance even when trained on a much smaller subset of the training data.

The following table shows the pose error (in meters/degrees) of MS-Transformer and iAPR, for the 7Scenes dataset, when training with 100%, 70%, 50% and 30% of the train set:

% of training data	MS-Transformer	iAPR
100%	0.18m / 7.28deg	0.17m / 6.69deg
70%	0.19m / 7.41deg	0.18m / 7.10deg
50%	0.19m / 7.73deg	0.18m / 6.89deg
30%	0.20m / 8.19deg	0.19m / 7.12deg

Data

Data (linked)	Description
7Scenes training 100p	100% of the training images
7Scenes training 70p	70% of the training images
7Scenes training 50p	50% of the training images
7Scenes training 30p	30% of the training images
7Scenes training pairs-100p	Training pairs generated from 100% of the training images
7Scenes training pairs-70p	Training pairs generated from 70% of the training images
7Scenes training pairs-50p	Training pairs generated from 50% of the training images
7Scenes training pairs-30p	Training pairs generated from 30% of the training images

Pre-trained Models

Model (linked)	Description
iAPR-100p	iAPR Model trained with 100% of 7Scenes dataset
PAE-100p	PAE Model trained with 100% of 7Scenes dataset (original MS-PAE model, available above)
MS-100p	MS-Transformer Model trained with 100% of 7Scenes dataset (original MS-Transformer model)
iAPR-70p	iAPR Model trained with 70% of 7Scenes dataset
PAE-70p	PAE Model trained with 70% of 7Scenes dataset
MS-70p	MS-Transformer Model trained with 70% of 7Scenes dataset
iAPR-50p	iAPR Model trained with 50% of 7Scenes dataset
PAE-50p	PAE Model trained with 50% of 7Scenes dataset
MS-50p	MS-Transformer Model trained with 50% of 7Scenes dataset
iAPR-30p	iAPR Model trained with 30% of 7Scenes dataset
PAE-30p	PAE Model trained with 30% of 7Scenes dataset
MS-30p	MS-Transformer Model trained with 30% of 7Scenes dataset

Other Applications of PAEs

Decoding Images from Encoded Train Poses

To train an image decoder for a PAE for the ShopFacade scene:

python main_reconstruct_img.py train <path to cambridge dataset>
datasets/CambridgeLandmarks/abs_cambridge_pose_sorted.csv_ShopFacade_train.csv
reconstruct_config.json pretrained_models/mstransformer_cambridge_pose_encoder.pth

To test our Decoder (decoding the train images for their PAE encoded poses):

python main_reconstruct_img.py
demo <path to cambridge dataset> datasets/CambridgeLandmarks/abs_cambridge_pose_sorted.csv_ShopFacade_train.csv
reconstruct_config.json pretrained_models/mstransformer_cambridge_pose_encoder.pth
--decoder_checkpoint_path pretrained_models/img_decoder_shop_facade.pth

You can download the pre-trained ShopFacade decoder from here