

Camera Pose Auto-Encoders (PAEs)


This repository provides an official implementation for:


Camera Pose Auto-Encoders (PAEs) are multi-layer perceptrons (MLPs), trained via a Teacher-Student approach to encode camera poses, using Absolute Pose Regressors (APRs) as their teachers (Fig. 1). Once trained, PAEs can closely reproduce their teachers' performance, across outdoor and indoor environments and when learning from multi- and single- scene APR teachers with different architectures.

<p align="center"> <img src="figs/training_paes.png" width="400" height="200"> </p> <p align = "center"> Fig. 1: Training PAEs </p>

Below we provide instructions for running our code in order to train teacher APRs and student PAEs and for evaluating them. We also provide pre-trained models.

Once a PAE is trained, we can use it to as a means for extending pose regression with visual and spatial information at a minimal cost.

Iterative Absolute Pose Regression (iAPR) is a new class of APRs, which combines absolute pose regression and relative pose regression, without additional image or pose storage. Specifically, it applies a PAE-based RPR on the initial APR estimate for one or more iterations (Fig. 2). iAPR achieves a new state-of-the-art (SOTA) localization accuracy for APRs on the 7Scenes dataset, even when trained with only 30% of the data.

<p align="center"> <img src="figs/iapr.png" width="400" height="100"> </p> <p align = "center"> Fig. 2: Our proposed iAPR method, combining absolute pose regression with PAE-based relative pose regression. </p>

As for PAEs, we provide instructions for training and testing our iAPR model.


In order to run this repository you will need:

  1. Python3 (tested with Python 3.7.7)
  2. PyTorch deep learning framework (tested with version 1.0.0)
  3. Download the Cambridge Landmarks dataset and the 7Scenes dataset
  4. You can also download pre-trained models to reproduce reported results (see below)
  5. For a quick set up you can run: pip install -r requirments.txt Note: All experiments reported in our paper were performed with an 8GB 1080 NVIDIA GeForce GTX GPU


Training and Testing Teacher APRs

Our code allows training and testing of single-scene and multi-scene APR teachers. Specifically, we use PoseNet with different CNN backbones as our single-scene APRs and MS-Transformer as our multi-scene APR.

For example, in order to train PoseNet with EfficientNet-B0 on the KingsCollege scene, run:

python main_train_test_apr.py posenet train models/backbones/efficient-net-b0.pth
<path to the CambridgeLandmarks dataset>

In order to train with a different backbone, change the path to the backbone (third argument) and the value of 'backbone_type', under the 'posenet' dictionary, in the json configuraion file. We support MobileNet and ResNet50.

After training, you can test your trained model by running:

python main_train_test_apr.py posenet test models/backbones/efficient-net-b0.pth <path to the CambridgeLandmarks dataset> datasets/CambridgeLandmarks/abs_cambridge_pose_sorted.csv_KingsCollege_test.csv
CambridgeLandmarks_config.json --checkpoint_path posenet_effnet_apr_kings_college.pth

In order to train and test MS-Transformer, please follow the instructions at our MS-Transformer repository

Training and Testing Student PAEs

Single-scene PAEs

To train a single-scene PAE with a PoseNet Teacher (with an EfficientNet-B0 backbone), run the following command, using the same configuration used for the teacher APR:

python main_learn_pose_encoding.py posenet train models/backbones/efficient-net-b0.pth
<path to dataset> datasets/CambridgeLandmarks/abs_cambridge_pose_sorted.csv_KingsCollege_train.csv
CambridgeLandmarks_config.json posenet_effnet_apr_kings_college.pth

You can then evaluate it and compare it to its teacher, by running:

python main_learn_pose_encoding.py posenet test models/backbones/efficient-net-b0.pth
<path to dataset> datasets/CambridgeLandmarks/abs_cambridge_pose_sorted.csv_KingsCollege_test.csv CambridgeLandmarks_config.json posenet_effnet_apr_kings_college.pth
--encoder_checkpoint_path posenet_effnet_apr_kings_college.pth
Multi-scene PAEs

Similarly, you can train a multi-scene PAE with MS-Transformer. For example, training on the 7Scenes dataset:

python main_learn_multiscene_pose_encoding.py
<path to dataset>

and then evaluate it by running:

python main_learn_multiscene_pose_encoding.py
<path to dataset>

Pre-trained Models

Model (Linked)Description
APR models
PoseNet+MobileNetSingle-scene APR, KingsCollege scene
PoseNet+ResNet50Single-scene APR, KingsCollege scene
PoseNet+EfficientB0Single-scene APR, KingsCollege scene
MS-TransformerMulti-scene APR, CambridgeLandmarks dataset
MS-TransformerMulti-scene APR, 7Scenes dataset
Camera Pose Auto-Encoders
Auto-Encoder for PoseNet+MobileNetAuto-Encoder for a single-scene APR, KingsCollege scene
Auto-Encoder for PoseNet+ResNet50Auto-Encoder for a single-scene APR, KingsCollege scene
Auto-Encoder for PoseNet+EfficientB0Auto-Encoder for a single-scene APR, KingsCollege scene
Auto-Encoder for MS-TransformerAuto-Encoder for a multi-scene APR, CambridgeLandmarks dataset
Auto-Encoder for MS-TransformerAuto-Encoder for a multi-scene APR, 7Scenes dataset


Training and Testing

We propose a PAE-based RPR model (Fig. 3) to estimate the relative motion between an encoded pose and a query image.

<p align="center"> <img src="figs/pae_rpr.png" width="250" height="400"> </p> <p align = "center"> Fig. 3: Our proposed PAE-based RPR architecture, for implementing iAPR. </p>

In order to train our model, run:

python main_iapr.py train <path to dataset> 7scenes_training_pairs.csv
7scenes_iapr_config.json pretrained_models/ems_transposenet_7scenes_pretrained.pth
models/backbones/efficient-net-b0.pth pretrained_models/mstransformer_7scenes_pose_encoder.pth 

Note: links to data are available below.

In order to test iAPR with our model, for example with the chess scene, run:

python main_iapr.py test <path to dataset> datasets/7Scenes/abs_7scenes_pose.csv_chess_test.csv
7scenes_iapr_config.json pretrained_models/ems_transposenet_7scenes_pretrained.pth
models/backbones/efficient-net-b0.pth pretrained_models/mstransformer_7scenes_pose_encoder.pth 
--checkpoint_path <path to iapr model>             

You can change the number of iterations in the configuration file. Pretrained models are available below.

our iAPR achieves SOTA performance on the 7Scenes dataset and improves performance even when trained on a much smaller subset of the training data.

The following table shows the pose error (in meters/degrees) of MS-Transformer and iAPR, for the 7Scenes dataset, when training with 100%, 70%, 50% and 30% of the train set:

% of training dataMS-TransformeriAPR
100%0.18m / 7.28deg0.17m / 6.69deg
70%0.19m / 7.41deg0.18m / 7.10deg
50%0.19m / 7.73deg0.18m / 6.89deg
30%0.20m / 8.19deg0.19m / 7.12deg


Data (linked)Description
7Scenes training 100p100% of the training images
7Scenes training 70p70% of the training images
7Scenes training 50p50% of the training images
7Scenes training 30p30% of the training images
7Scenes training pairs-100pTraining pairs generated from 100% of the training images
7Scenes training pairs-70pTraining pairs generated from 70% of the training images
7Scenes training pairs-50pTraining pairs generated from 50% of the training images
7Scenes training pairs-30pTraining pairs generated from 30% of the training images

Pre-trained Models

Model (linked)Description
iAPR-100piAPR Model trained with 100% of 7Scenes dataset
PAE-100pPAE Model trained with 100% of 7Scenes dataset (original MS-PAE model, available above)
MS-100pMS-Transformer Model trained with 100% of 7Scenes dataset (original MS-Transformer model)
iAPR-70piAPR Model trained with 70% of 7Scenes dataset
PAE-70pPAE Model trained with 70% of 7Scenes dataset
MS-70pMS-Transformer Model trained with 70% of 7Scenes dataset
iAPR-50piAPR Model trained with 50% of 7Scenes dataset
PAE-50pPAE Model trained with 50% of 7Scenes dataset
MS-50pMS-Transformer Model trained with 50% of 7Scenes dataset
iAPR-30piAPR Model trained with 30% of 7Scenes dataset
PAE-30pPAE Model trained with 30% of 7Scenes dataset
MS-30pMS-Transformer Model trained with 30% of 7Scenes dataset

Other Applications of PAEs

Decoding Images from Encoded Train Poses

To train an image decoder for a PAE for the ShopFacade scene:

python main_reconstruct_img.py train <path to cambridge dataset>
reconstruct_config.json pretrained_models/mstransformer_cambridge_pose_encoder.pth

To test our Decoder (decoding the train images for their PAE encoded poses):

python main_reconstruct_img.py
demo <path to cambridge dataset> datasets/CambridgeLandmarks/abs_cambridge_pose_sorted.csv_ShopFacade_train.csv
reconstruct_config.json pretrained_models/mstransformer_cambridge_pose_encoder.pth
--decoder_checkpoint_path pretrained_models/img_decoder_shop_facade.pth

You can download the pre-trained ShopFacade decoder from here