Awesome
PoseLib
This library provides a collection of minimal solvers for camera pose estimation. The focus is on calibrated absolute pose estimation problems from different types of correspondences (e.g. point-point, point-line, line-point, line-line).
The goals of this project are to provide
- Fast and robust implementation of the current state-of-the-art solvers.
- Consistent calling interface between different solvers.
- Minimize dependencies, both external (currently only Eigen) and internal. Each solver is (mostly) stand-alone, making it easy to extract only a specific solver to integrate into other frameworks.
- Robust estimators (based on LO-RANSAC) that just works out-of-the-box for most cases.
Robust Estimation and Non-linear Refinement
We provide robust estimators for the most common problems
- Absolute pose from points (and lines)
- Essential / Fundamental matrix
- Homography
- Generalized relative pose
It is fairly straight-forward to implement robust estimators for other problems. See for example absolute_pose.h. If you implement estimators for other problems, please consider submitting a pull-request.
In robust.h we provide interfaces which normalizes the data, calls the RANSAC and runs a post-RANSAC non-linear refinement. It is also possible to directly call the individual components as well (see e.g. ransac.h, bundle.h, etc.). The RANSAC is straight-forward implementation of LO-RANSAC which generate hypothesis with minimal solvers and relies on non-linear refinement for refitting.
The robust estimator takes the following options
struct RansacOptions {
size_t max_iterations = 100000;
size_t min_iterations = 1000;
double dyn_num_trials_mult = 3.0;
double success_prob = 0.9999;
double max_reproj_error = 12.0; // used for 2D-3D matches
double max_epipolar_error = 1.0; // used for 2D-2D matches
unsigned long seed = 0;
// If we should use PROSAC sampling. Assumes data is sorted
bool progressive_sampling = false;
size_t max_prosac_iterations = 100000;
// Whether to use real focal length checking for F estimation: https://arxiv.org/abs/2311.16304
// Assumes that principal points of both cameras are at origin.
bool real_focal_check = false;
};
and the non-linear refinement
struct BundleOptions {
size_t max_iterations = 100;
enum LossType {
TRIVIAL, TRUNCATED, HUBER, CAUCHY, TRUNCATED_LE_ZACH
} loss_type = LossType::CAUCHY;
double loss_scale = 1.0;
double gradient_tol = 1e-8;
double step_tol = 1e-8;
double initial_lambda = 1e-3;
double min_lambda = 1e-10;
double max_lambda = 1e10;
bool verbose = false;
};
Note that in robust.h this is only used for the post-RANSAC refinement.
In bundle.h we provide non-linear refinement for different problems. Mainly minimizing reprojection error and Sampson error as these performed best in our internal evaluations. These are used in the LO-RANSAC to perform non-linear refitting. Most estimators directly minimize the MSAC score (using loss_type = TRUNCATED
and loss_scale = threshold
) over all input correspondences. In practice we found that this works quite well and avoids recursive LO where inliers are added in steps.
Camera models
PoseLib use COLMAP-compatible camera models. These are defined in colmap_models.h. Currently we only support
- SIMPLE_PINHOLE
- PINHOLE
- SIMPLE_RADIAL
- RADIAL
- OPENCV
- OPENCV_FISHEYE
but it is relatively straight-forward to add other models. If you do so please consider opening a pull-request. In contrast to COLMAP, we require analytical jacobians for the distortion mappings which make it a bit more work to port them.
The Camera
struct currently contains width
/height
fields, however these are not used anywhere in the code-base and are provided simply to be consistent with COLMAP. The Camera
class also provides the helper function initialize_from_txt(str)
which initializes the camera from a line given by the cameras.txt
file of a COLMAP reconstruction.
The python bindings also expose the poselib.Camera
class with focal(), focal_x(), focal_y(), model_name(), prinicipal_point()
read-only methods and a read-write params
property, but currently this is only used as a return type for some methods. To supply camera information to robust estimators you should use python dicts
as shown below.
Python bindings
The python bindings can be installed by running pip install .
. The python bindings expose all minimal solvers, e.g. poselib.p3p(x,X)
, as well as all robust estimators from robust.h.
Examples of how the robust estimators can be called are
camera = {'model': 'SIMPLE_PINHOLE', 'width': 1200, 'height': 800, 'params': [960, 600, 400]}
pose, info = poselib.estimate_absolute_pose(p2d, p3d, camera, {'max_reproj_error': 16.0}, {})
or
F, info = poselib.estimate_fundamental_matrix(p2d_1, p2d_2, {'max_epipolar_error': 0.75, 'progressive_sampling': True}, {})
The return value info
is a dict containing information about the robust estimation (inliers, iterations, etc). The last two options are dicts which describe the RansacOptions
and BundleOptions
. Ommited values are set to their default (see above), except for the loss_scale
used for the Cauchy loss which is set to half of the threshold used in RANSAC (which seems to be a good heuristic). Dicts with the default options can be obtained as opt = poselib.RansacOptions()
or poselib.BundleOptions()
.
Some of the available estimators are listed below, check pyposelib.cpp and robust.h for more details. The table also shows which error threshold is used in the estimation (RansacOptions.max_reproj_error
or RansacOptions.max_epipolar_error
). All thresholds are given in pixels.
Method | Arguments | (RansacOptions) Threshold |
---|---|---|
<sub>estimate_absolute_pose </sub> | <sub> (p2d, p3d, camera, ransac_opt,bundle_opt) </sub> | <sub>max_reproj_error </sub> |
<sub>estimate_absolute_pose_pnpl </sub> | <sub>(p2d, p3d, l2d_1, l2d_2, l3d_1, l3d_2, camera, ransac_opt, bundle_opt) </sub> | <sub>max_reproj_error (points), max_epipolar_error (lines) |
<sub>estimate_generalized_absolute_pose | <sub>(p2ds, p3ds, camera_ext, cameras, ransac_opt, bundle_opt) </sub> | <sub>max_reproj_error </sub> |
<sub>estimate_relative_pose </sub> | <sub>(x1, x2, camera1, camera2, ransac_opt, bundle_opt) </sub> | <sub>max_epipolar_error </sub> |
<sub>estimate_shared_focal_relative_pose </sub> | <sub>(x1, x2, pp, ransac_opt, bundle_opt) </sub> | <sub>max_epipolar_error </sub> |
<sub>estimate_fundamental </sub> | <sub>(x1, x2, ransac_opt, bundle_opt) </sub> | <sub>max_epipolar_error </sub> |
<sub>estimate_homography </sub> | <sub>(x1, x2, ransac_opt, bundle_opt) </sub> | <sub>max_reproj_error </sub> |
<sub>estimate_generalized_relative_pose </sub> | <sub>(matches, camera1_ext, cameras1, camera2_ext, cameras2, ransac_opt, bundle_opt) </sub> | <sub>max_epipolar_error </sub> |
Storing poses and estimated camera parameters
To handle poses and cameras we provide the following classes:
CameraPose
: This class is the return type for the most of the methods. While the class internally represent the pose withq
andt
, it also exposesR
(3x3) andRt
(3x4) which are read/write, i.e. you can dopose.R = Rnew
and it will update the underlying quaternionq
.Image
: Following COLMAP, this class stores information about the camera (image.camera
) and its pose (image.pose
) used to take an image.ImagePair
: This class holds information about two cameras (image_pair.camera1
,image_pair.camera2
) and their relative pose (image_pair.pose
). This class is used as the return type for theestimate_shared_focal_relative_pose
robust estimator.
All of these are also exposed via python bindings as: poselib.CameraPose, poselib.Image, poselib.ImagePair
.
Benchmarking the robust estimators
To sanity-check the robust estimators we benchmark against the LO-RANSAC implementation from pycolmap.
<a href="https://user-images.githubusercontent.com/48490995/149815304-b3c1049a-ee64-4c14-be60-d4930535a3e7.png"><img src="https://user-images.githubusercontent.com/48490995/149815304-b3c1049a-ee64-4c14-be60-d4930535a3e7.png" width="75%"></a>
For all of the metrics higher is better (except for runtime).
Minimal Solvers
Naming convention
For the solver names we use a slightly non-standard notation where we denote the solver as
<pre> p<b><i>X</i></b>p<b><i>Y</i></b>pl<b><i>Z</i></b>lp<b><i>W</i></b>ll </pre>where the number of correspondences required is given by
- <b><i>X</i></b>p - 2D point to 3D point,
- <b><i>Y</i></b>pl - 2D point to 3D line,
- <b><i>Z</i></b>lp - 2D line to 3D point,
- <b><i>W</i></b>ll - 2D line to 3D line.
The prefix with u
is for upright solvers and g
for generalized camera solvers. Solvers that estimate focal length have the postfix with f
and similarly s
for solvers that estimate scale.
Calling conventions
All solvers return their solutions as a vector of CameraPose
structs, which defined as
struct CameraPose {
Eigen::Vector4d q;
Eigen::Vector3d t;
};
where the rotation is representation as a quaternion q
and the convention is that [R t]
maps from the world coordinate system into the camera coordinate system.
For <b>2D point to 3D point</b> correspondences, the image points are represented as unit-length bearings vectors. The returned camera poses (R,t)
then satisfies (for some lambda
)
lambda * x[i] = R * X[i] + t
where x[i]
is the 2D point and X[i]
is the 3D point.
<b>Note</b> that only the P3P solver filters solutions with negative lambda
.
Solvers that use point-to-point constraints take one vector with bearing vectors x
and one vector with the corresponding 3D points X
, e.g. for the P3P solver the function declaration is
int p3p(const std::vector<Eigen::Vector3d> &x,
const std::vector<Eigen::Vector3d> &X,
std::vector<CameraPose> *output);
Each solver returns the number of real solutions found.
For constraints with <b>2D lines</b>, the lines are represented in homogeneous coordinates. In the case of 2D line to 3D point constraints, the returned camera poses then satisfies
l[i].transpose() * (R * X[i] + t) = 0
where l[i]
is the line and X[i]
is the 3D point.
For constraints with <b>3D lines</b>, the lines are represented by a 3D point X
and a bearing vector V
. In the case of 2D point to 3D point constraints
lambda * x[i] = R * (X[i] + mu * V[i]) + t
for some values of lambda
and mu
. Similarly, for line to line constraints we have
l[i].transpose() * (R * (X[i] + mu * V[i]) + t) = 0
Generalized Cameras
For generalized cameras we represent the image rays similarly to the 3D lines above, with an offset p
and a bearing vector x
. For example, in the case of point-to-point correspondences we have
p[i] + lambda * x[i] = R * X[i] + t
In the case of unknown scale we also estimate alpha
such that
alpha * p[i] + lambda * x[i] = R * X[i] + t
For example, the generalized pose and scale solver (from four points) has the following signature
int gp4ps(const std::vector<Eigen::Vector3d> &p, const std::vector<Eigen::Vector3d> &x,
const std::vector<Eigen::Vector3d> &X, std::vector<CameraPose> *output);
Upright Solvers
For the upright solvers it assumed that the rotation is around the y-axis, i.e.
R = [a 0 -b; 0 1 0; b 0 a]
To use these solvers it necessary to pre-rotate the input such that this is satisfied.
Implemented solvers
The following solvers are currently implemented.
Absolute Pose
Solver | Point-Point | Point-Line | Line-Point | Line-Line | Upright | Generalized | Approx. runtime | Max. solutions | Comment |
---|---|---|---|---|---|---|---|---|---|
p3p | 3 | 0 | 0 | 0 | 250 ns | 4 | Ding et al., (CVPR23) | ||
gp3p | 3 | 0 | 0 | 0 | :heavy_check_mark: | 1.6 us | 8 | Kukelova et al., E3Q3 (CVPR16) | |
gp4ps | 4 | 0 | 0 | 0 | :heavy_check_mark: | 1.8 us | 8 | Unknown scale.<br> Kukelova et al., E3Q3 (CVPR16)<br>Camposeco et al.(ECCV16) | |
p4pf | 4 | 0 | 0 | 0 | 2.3 us | 8 | Unknown focal length.<br> Kukelova et al., E3Q3 (CVPR16) | ||
p2p2pl | 2 | 2 | 0 | 0 | 30 us | 16 | Josephson et al. (CVPR07) | ||
p6lp | 0 | 0 | 6 | 0 | 1.8 us | 8 | Kukelova et al., E3Q3 (CVPR16) | ||
p5lp_radial | 0 | 0 | 5 | 0 | 1 us | 4 | Kukelova et al., (ICCV13) | ||
p2p1ll | 2 | 0 | 0 | 1 | 1.6 us | 8 | Kukelova et al., E3Q3 (CVPR16), Zhou et al. (ACCV18) | ||
p1p2ll | 1 | 0 | 0 | 2 | 1.7 us | 8 | Kukelova et al., E3Q3 (CVPR16), Zhou et al. (ACCV18) | ||
p3ll | 0 | 0 | 0 | 3 | 1.8 us | 8 | Kukelova et al., E3Q3 (CVPR16), Zhou et al. (ACCV18) | ||
up2p | 2 | 0 | 0 | 0 | :heavy_check_mark: | 65 ns | 2 | Kukelova et al. (ACCV10) | |
ugp2p | 2 | 0 | 0 | 0 | :heavy_check_mark: | :heavy_check_mark: | 65 ns | 2 | Adapted from Kukelova et al. (ACCV10) |
ugp3ps | 3 | 0 | 0 | 0 | :heavy_check_mark: | :heavy_check_mark: | 390 ns | 2 | Unknown scale. Adapted from Kukelova et al. (ACCV10) |
up1p2pl | 1 | 2 | 0 | 0 | :heavy_check_mark: | 370 ns | 4 | ||
up4pl | 0 | 4 | 0 | 0 | :heavy_check_mark: | 1.4 us | 8 | Sweeney et al. (3DV14) | |
ugp4pl | 0 | 4 | 0 | 0 | :heavy_check_mark: | :heavy_check_mark: | 1.4 us | 8 | Sweeney et al. (3DV14) |
Relative Pose
Solver | Point-Point | Upright | Planar | Generalized | Approx. runtime | Max. solutions | Comment |
---|---|---|---|---|---|---|---|
relpose_5pt | 5 | 5.5 us | 10 | Nister (PAMI 2004) | |||
relpose_8pt | 8+ | 2.2+ us | 1 | ||||
relpose_upright_3pt | 3 | :heavy_check_mark: | 210 ns | 4 | Ding et al., (CVPR23) | ||
gen_relpose_upright_4pt | 4 | :heavy_check_mark: | :heavy_check_mark: | 1.2 us | 6 | Sweeney et al. (3DV14) | |
relpose_upright_planar_2pt | 2 | :heavy_check_mark: | :heavy_check_mark: | 120 ns | 2 | Choi and Kim (IVC 2018) | |
relpose_upright_planar_3pt | 3 | :heavy_check_mark: | :heavy_check_mark: | 300 ns | 1 | Choi and Kim (IVC 2018) | |
gen_relpose_5p1pt | 5+1 | :heavy_check_mark: | 5.5 us | 10 | E + 1pt to fix scale | ||
relpose_6pt_shared_focal | 6 | 33 us | 15 | Stewénius et al. (IVC 2008) |
Decompositions
Poselib also provides methods for decomposing fundamental matrices to obtain the focal lengths of the cameras.
Method | Arguments | Output | Comment |
---|---|---|---|
<sub>focals_from_fundamental </sub> | <sub>(F, pp1, pp2) </sub> | <sub>(cam1, cam2) </sub> | Bougnoux (ICCV 1998) |
<sub>focals_from_fundamental_iterative </sub> | <sub>(F, cam1_prior, cam2_prior, max_iters = 50, weights = {5e-4, 1.0, 5e-4, 1.0}) </sub> | <sub>(cam1, cam2, iters) </sub> | Kocur et al. (CVPR 2024) |
Both of the methods also have python bindings. To obtain the focal lengths from the camera object you can use focal = cam.focal()
. Note that both of these methods can produce very inaccurate results and fail often such that the output focal lengths can be NaNs or negative numbers. If you need to estimate a focal length shared by both cameras (e.g. the same camera in both views) you should use estimate_shared_focal_relative_pose
.
How to compile?
Getting the code:
> git clone --recursive https://github.com/vlarsson/PoseLib.git
> cd PoseLib
Example of a local installation:
> mkdir _build && cd _build
> cmake -DCMAKE_INSTALL_PREFIX=../_install ..
> cmake --build . --target install -j 8
(equivalent to 'make install -j8' in linux)
Installed files:
> tree ../_install
.
├── bin
│ └── benchmark
├── include
│ └── PoseLib
│ ├── solvers/gp3p.h
│ ├── ...
│ ├── poselib.h <== Library header (includes all the rest)
│ ├── ...
│ └── version.h
└── lib
├── cmake
│ └── PoseLib
│ ├── PoseLibConfig.cmake
│ ├── PoseLibConfigVersion.cmake
│ ├── PoseLibTargets.cmake
│ └── PoseLibTargets-release.cmake
└── libPoseLib.a
Uninstall library:
> make uninstall
Installation
Installing PoseLib using Conan
You can install pre-built binaries for PoseLib or build it from source using Conan. Use the following command:
conan install --requires="poselib/[*]" --build=missing
The PoseLib Conan recipe is kept up to date by Conan maintainers and community contributors. If the version is out of date, please create an issue or pull request on the ConanCenterIndex repository.
Benchmark
Conditional compilation of benchmark
binary is controlled by WITH_BENCHMARK
option. Default if OFF (without benchmark).
Add -DWITH_BENCHMARK=ON
to cmake to activate.
> cmake -DWITH_BENCHMARK=ON ..
Use library (as dependency) in an external project.
cmake_minimum_required(VERSION 3.13)
project(Foo)
find_package(PoseLib REQUIRED)
add_executable(foo foo.cpp)
target_link_libraries(foo PRIVATE PoseLib::PoseLib)
Citing
If you are using the library for (scientific) publications, please cite the following source:
@misc{PoseLib,
title = {{PoseLib - Minimal Solvers for Camera Pose Estimation}},
author = {Viktor Larsson and contributors},
URL = {https://github.com/vlarsson/PoseLib},
year = {2020}
}
Please cite also the original publications of the different methods (see table above).
Changelog
2.0.4 - Aug 5th 2024
- Added implementation of OpenCVFisheye camera model
- Bumped pybind11 version which seems to fix some crashes
- Added cmake option to disable -march=native
2.0.3 - Jul. 2024
- Added decomposition methods for estimation of focal lengths from fundamental matrices
2.0.2 - Apr. 2024
- Added solver and robust estimator for 6p relative pose with unknown shared focal length
- Added Image, ImagePair classes with python bindings
- Exposed Camera via python bindings
2.0.1 - Sep. 2023
- Refactor pybind such that
pip install .
works. Moved pybind11 to submodule. - C++ alignment fixes. Should now work with Eigen 3.3 and the header should be COLMAP compatible.
2.0 - Jan. 2022
- Added robust estimators (LO-RANSAC) and non-linear refinement
- Refactored CameraPose to use quaternion instead 3x3 matrix. Removed alpha.
- Implemented TR-IRLS method from Le and Zach (3DV 2021)
- Restructured pybind11 interface
- Added support for PROSAC sampling
- Many minor fixes and improvements....
1.0 - Jan. 2020
- Initial release
License
PoseLib is licensed under the BSD 3-Clause license. Please see License for details.
Acknowledgements
The RANSAC implementation is heavily inspired by RansacLib from Torsten Sattler.