Home

Awesome

GitHub release (latest by date) Conan Center PyPI

PoseLib

This library provides a collection of minimal solvers for camera pose estimation. The focus is on calibrated absolute pose estimation problems from different types of correspondences (e.g. point-point, point-line, line-point, line-line).

The goals of this project are to provide

Robust Estimation and Non-linear Refinement

We provide robust estimators for the most common problems

It is fairly straight-forward to implement robust estimators for other problems. See for example absolute_pose.h. If you implement estimators for other problems, please consider submitting a pull-request.

In robust.h we provide interfaces which normalizes the data, calls the RANSAC and runs a post-RANSAC non-linear refinement. It is also possible to directly call the individual components as well (see e.g. ransac.h, bundle.h, etc.). The RANSAC is straight-forward implementation of LO-RANSAC which generate hypothesis with minimal solvers and relies on non-linear refinement for refitting.

The robust estimator takes the following options

struct RansacOptions {
    size_t max_iterations = 100000;
    size_t min_iterations = 1000;
    double dyn_num_trials_mult = 3.0;
    double success_prob = 0.9999;
    double max_reproj_error = 12.0;  // used for 2D-3D matches
    double max_epipolar_error = 1.0; // used for 2D-2D matches
    unsigned long seed = 0;
    // If we should use PROSAC sampling. Assumes data is sorted
    bool progressive_sampling = false;
    size_t max_prosac_iterations = 100000;
    // Whether to use real focal length checking for F estimation: https://arxiv.org/abs/2311.16304
    // Assumes that principal points of both cameras are at origin.
    bool real_focal_check = false;
};

and the non-linear refinement

struct BundleOptions {
    size_t max_iterations = 100;
    enum LossType {
        TRIVIAL, TRUNCATED, HUBER, CAUCHY, TRUNCATED_LE_ZACH
    } loss_type = LossType::CAUCHY;
    double loss_scale = 1.0;
    double gradient_tol = 1e-8;
    double step_tol = 1e-8;
    double initial_lambda = 1e-3;
    double min_lambda = 1e-10;
    double max_lambda = 1e10;
    bool verbose = false;
};

Note that in robust.h this is only used for the post-RANSAC refinement.

In bundle.h we provide non-linear refinement for different problems. Mainly minimizing reprojection error and Sampson error as these performed best in our internal evaluations. These are used in the LO-RANSAC to perform non-linear refitting. Most estimators directly minimize the MSAC score (using loss_type = TRUNCATED and loss_scale = threshold) over all input correspondences. In practice we found that this works quite well and avoids recursive LO where inliers are added in steps.

Camera models

PoseLib use COLMAP-compatible camera models. These are defined in colmap_models.h. Currently we only support

but it is relatively straight-forward to add other models. If you do so please consider opening a pull-request. In contrast to COLMAP, we require analytical jacobians for the distortion mappings which make it a bit more work to port them.

The Camera struct currently contains width/height fields, however these are not used anywhere in the code-base and are provided simply to be consistent with COLMAP. The Camera class also provides the helper function initialize_from_txt(str) which initializes the camera from a line given by the cameras.txt file of a COLMAP reconstruction.

The python bindings also expose the poselib.Camera class with focal(), focal_x(), focal_y(), model_name(), prinicipal_point() read-only methods and a read-write params property, but currently this is only used as a return type for some methods. To supply camera information to robust estimators you should use python dicts as shown below.

Python bindings

The python bindings can be installed by running pip install .. The python bindings expose all minimal solvers, e.g. poselib.p3p(x,X), as well as all robust estimators from robust.h.

Examples of how the robust estimators can be called are

camera = {'model': 'SIMPLE_PINHOLE', 'width': 1200, 'height': 800, 'params': [960, 600, 400]}

pose, info = poselib.estimate_absolute_pose(p2d, p3d, camera, {'max_reproj_error': 16.0}, {})

or

F, info = poselib.estimate_fundamental_matrix(p2d_1, p2d_2, {'max_epipolar_error': 0.75, 'progressive_sampling': True}, {})

The return value info is a dict containing information about the robust estimation (inliers, iterations, etc). The last two options are dicts which describe the RansacOptions and BundleOptions. Ommited values are set to their default (see above), except for the loss_scale used for the Cauchy loss which is set to half of the threshold used in RANSAC (which seems to be a good heuristic). Dicts with the default options can be obtained as opt = poselib.RansacOptions() or poselib.BundleOptions().

Some of the available estimators are listed below, check pyposelib.cpp and robust.h for more details. The table also shows which error threshold is used in the estimation (RansacOptions.max_reproj_error or RansacOptions.max_epipolar_error). All thresholds are given in pixels.

MethodArguments(RansacOptions) Threshold
<sub>estimate_absolute_pose</sub><sub> (p2d, p3d, camera, ransac_opt,bundle_opt)</sub><sub>max_reproj_error </sub>
<sub>estimate_absolute_pose_pnpl</sub><sub>(p2d, p3d, l2d_1, l2d_2, l3d_1, l3d_2, camera, ransac_opt, bundle_opt) </sub><sub>max_reproj_error (points), max_epipolar_error (lines)
<sub>estimate_generalized_absolute_pose<sub>(p2ds, p3ds, camera_ext, cameras, ransac_opt, bundle_opt)</sub><sub>max_reproj_error</sub>
<sub>estimate_relative_pose</sub><sub>(x1, x2, camera1, camera2, ransac_opt, bundle_opt)</sub><sub>max_epipolar_error </sub>
<sub>estimate_shared_focal_relative_pose</sub><sub>(x1, x2, pp, ransac_opt, bundle_opt)</sub><sub>max_epipolar_error </sub>
<sub>estimate_fundamental</sub><sub>(x1, x2, ransac_opt, bundle_opt)</sub><sub>max_epipolar_error</sub>
<sub>estimate_homography</sub><sub>(x1, x2, ransac_opt, bundle_opt)</sub><sub>max_reproj_error</sub>
<sub>estimate_generalized_relative_pose</sub><sub>(matches, camera1_ext, cameras1, camera2_ext, cameras2, ransac_opt, bundle_opt)</sub><sub>max_epipolar_error</sub>

Storing poses and estimated camera parameters

To handle poses and cameras we provide the following classes:

All of these are also exposed via python bindings as: poselib.CameraPose, poselib.Image, poselib.ImagePair.

Benchmarking the robust estimators

To sanity-check the robust estimators we benchmark against the LO-RANSAC implementation from pycolmap.

<a href="https://user-images.githubusercontent.com/48490995/149815304-b3c1049a-ee64-4c14-be60-d4930535a3e7.png"><img src="https://user-images.githubusercontent.com/48490995/149815304-b3c1049a-ee64-4c14-be60-d4930535a3e7.png" width="75%"></a>

For all of the metrics higher is better (except for runtime).

Minimal Solvers

Naming convention

For the solver names we use a slightly non-standard notation where we denote the solver as

<pre> p<b><i>X</i></b>p<b><i>Y</i></b>pl<b><i>Z</i></b>lp<b><i>W</i></b>ll </pre>

where the number of correspondences required is given by

The prefix with u is for upright solvers and g for generalized camera solvers. Solvers that estimate focal length have the postfix with f and similarly s for solvers that estimate scale.

Calling conventions

All solvers return their solutions as a vector of CameraPose structs, which defined as

struct CameraPose {
   Eigen::Vector4d q;
   Eigen::Vector3d t;
};

where the rotation is representation as a quaternion q and the convention is that [R t] maps from the world coordinate system into the camera coordinate system.

For <b>2D point to 3D point</b> correspondences, the image points are represented as unit-length bearings vectors. The returned camera poses (R,t) then satisfies (for some lambda)

  lambda * x[i] = R * X[i] + t

where x[i] is the 2D point and X[i] is the 3D point. <b>Note</b> that only the P3P solver filters solutions with negative lambda.

Solvers that use point-to-point constraints take one vector with bearing vectors x and one vector with the corresponding 3D points X, e.g. for the P3P solver the function declaration is

int p3p(const std::vector<Eigen::Vector3d> &x,
        const std::vector<Eigen::Vector3d> &X,
        std::vector<CameraPose> *output);

Each solver returns the number of real solutions found.

For constraints with <b>2D lines</b>, the lines are represented in homogeneous coordinates. In the case of 2D line to 3D point constraints, the returned camera poses then satisfies

  l[i].transpose() * (R * X[i] + t) = 0

where l[i] is the line and X[i] is the 3D point.

For constraints with <b>3D lines</b>, the lines are represented by a 3D point X and a bearing vector V. In the case of 2D point to 3D point constraints

  lambda * x[i] = R * (X[i] + mu * V[i]) + t

for some values of lambda and mu. Similarly, for line to line constraints we have

  l[i].transpose() * (R * (X[i] + mu * V[i]) + t) = 0

Generalized Cameras

For generalized cameras we represent the image rays similarly to the 3D lines above, with an offset p and a bearing vector x. For example, in the case of point-to-point correspondences we have

p[i] + lambda * x[i] = R * X[i] + t

In the case of unknown scale we also estimate alpha such that

alpha * p[i] + lambda * x[i] = R * X[i] + t

For example, the generalized pose and scale solver (from four points) has the following signature

 int gp4ps(const std::vector<Eigen::Vector3d> &p, const std::vector<Eigen::Vector3d> &x,
              const std::vector<Eigen::Vector3d> &X, std::vector<CameraPose> *output);

Upright Solvers

For the upright solvers it assumed that the rotation is around the y-axis, i.e.

R = [a 0 -b; 0 1 0; b 0 a] 

To use these solvers it necessary to pre-rotate the input such that this is satisfied.

Implemented solvers

The following solvers are currently implemented.

Absolute Pose

SolverPoint-PointPoint-LineLine-PointLine-LineUprightGeneralizedApprox. runtimeMax. solutionsComment
p3p3000250 ns4Ding et al., (CVPR23)
gp3p3000:heavy_check_mark:1.6 us8Kukelova et al., E3Q3 (CVPR16)
gp4ps4000:heavy_check_mark:1.8 us8Unknown scale.<br> Kukelova et al., E3Q3 (CVPR16)<br>Camposeco et al.(ECCV16)
p4pf40002.3 us8Unknown focal length.<br> Kukelova et al., E3Q3 (CVPR16)
p2p2pl220030 us16Josephson et al. (CVPR07)
p6lp00601.8 us8Kukelova et al., E3Q3 (CVPR16)
p5lp_radial00501 us4Kukelova et al., (ICCV13)
p2p1ll20011.6 us8Kukelova et al., E3Q3 (CVPR16), Zhou et al. (ACCV18)
p1p2ll10021.7 us8Kukelova et al., E3Q3 (CVPR16), Zhou et al. (ACCV18)
p3ll00031.8 us8Kukelova et al., E3Q3 (CVPR16), Zhou et al. (ACCV18)
up2p2000:heavy_check_mark:65 ns2Kukelova et al. (ACCV10)
ugp2p2000:heavy_check_mark::heavy_check_mark:65 ns2Adapted from Kukelova et al. (ACCV10)
ugp3ps3000:heavy_check_mark::heavy_check_mark:390 ns2Unknown scale. Adapted from Kukelova et al. (ACCV10)
up1p2pl1200:heavy_check_mark:370 ns4
up4pl0400:heavy_check_mark:1.4 us8Sweeney et al. (3DV14)
ugp4pl0400:heavy_check_mark::heavy_check_mark:1.4 us8Sweeney et al. (3DV14)

Relative Pose

SolverPoint-PointUprightPlanarGeneralizedApprox. runtimeMax. solutionsComment
relpose_5pt55.5 us10Nister (PAMI 2004)
relpose_8pt8+2.2+ us1
relpose_upright_3pt3:heavy_check_mark:210 ns4Ding et al., (CVPR23)
gen_relpose_upright_4pt4:heavy_check_mark::heavy_check_mark:1.2 us6Sweeney et al. (3DV14)
relpose_upright_planar_2pt2:heavy_check_mark::heavy_check_mark:120 ns2Choi and Kim (IVC 2018)
relpose_upright_planar_3pt3:heavy_check_mark::heavy_check_mark:300 ns1Choi and Kim (IVC 2018)
gen_relpose_5p1pt5+1:heavy_check_mark:5.5 us10E + 1pt to fix scale
relpose_6pt_shared_focal633 us15Stewénius et al. (IVC 2008)

Decompositions

Poselib also provides methods for decomposing fundamental matrices to obtain the focal lengths of the cameras.

MethodArgumentsOutputComment
<sub>focals_from_fundamental </sub><sub>(F, pp1, pp2)</sub><sub>(cam1, cam2)</sub>Bougnoux (ICCV 1998)
<sub>focals_from_fundamental_iterative</sub><sub>(F, cam1_prior, cam2_prior, max_iters = 50, weights = {5e-4, 1.0, 5e-4, 1.0})</sub><sub>(cam1, cam2, iters)</sub>Kocur et al. (CVPR 2024)

Both of the methods also have python bindings. To obtain the focal lengths from the camera object you can use focal = cam.focal(). Note that both of these methods can produce very inaccurate results and fail often such that the output focal lengths can be NaNs or negative numbers. If you need to estimate a focal length shared by both cameras (e.g. the same camera in both views) you should use estimate_shared_focal_relative_pose.

How to compile?

Getting the code:

> git clone --recursive https://github.com/vlarsson/PoseLib.git
> cd PoseLib

Example of a local installation:

> mkdir _build && cd _build
> cmake -DCMAKE_INSTALL_PREFIX=../_install ..
> cmake --build . --target install -j 8
  (equivalent to  'make install -j8' in linux)

Installed files:

> tree ../_install
  .
  ├── bin
  │   └── benchmark
  ├── include
  │   └── PoseLib
  │       ├── solvers/gp3p.h
  │       ├──  ...
  │       ├── poselib.h          <==  Library header (includes all the rest)
  │       ├──  ...
  │       └── version.h
  └── lib
      ├── cmake
      │   └── PoseLib
      │       ├── PoseLibConfig.cmake
      │       ├── PoseLibConfigVersion.cmake
      │       ├── PoseLibTargets.cmake
      │       └── PoseLibTargets-release.cmake
      └── libPoseLib.a

Uninstall library:

> make uninstall

Installation

Installing PoseLib using Conan

You can install pre-built binaries for PoseLib or build it from source using Conan. Use the following command:

conan install --requires="poselib/[*]" --build=missing

The PoseLib Conan recipe is kept up to date by Conan maintainers and community contributors. If the version is out of date, please create an issue or pull request on the ConanCenterIndex repository.

Benchmark

Conditional compilation of benchmark binary is controlled by WITH_BENCHMARK option. Default if OFF (without benchmark).

Add -DWITH_BENCHMARK=ON to cmake to activate.

> cmake -DWITH_BENCHMARK=ON ..

Use library (as dependency) in an external project.

cmake_minimum_required(VERSION 3.13)
project(Foo)

find_package(PoseLib REQUIRED)

add_executable(foo foo.cpp)
target_link_libraries(foo PRIVATE PoseLib::PoseLib)

Citing

If you are using the library for (scientific) publications, please cite the following source:

@misc{PoseLib,
  title = {{PoseLib - Minimal Solvers for Camera Pose Estimation}},
  author = {Viktor Larsson and contributors},
  URL = {https://github.com/vlarsson/PoseLib},
  year = {2020}
}

Please cite also the original publications of the different methods (see table above).

Changelog

2.0.4 - Aug 5th 2024

2.0.3 - Jul. 2024

2.0.2 - Apr. 2024

2.0.1 - Sep. 2023

2.0 - Jan. 2022

1.0 - Jan. 2020

License

PoseLib is licensed under the BSD 3-Clause license. Please see License for details.

Acknowledgements

The RANSAC implementation is heavily inspired by RansacLib from Torsten Sattler.