Home

Awesome

Accurate geometric camera calibration

Overview

This repository contains a tool for accurate geometric camera calibration, i.e., establishing a mapping between image pixels and the pixels' 3D observation directions respectively lines. In particular, it supports calibration with generic camera models, which fit nearly every camera and allow for highly accurate calibration. The tool also includes support to calibrate fixed camera rigs and additionally supports estimating accurate depth images for stereo cameras such as the Intel D435 or the Occipital Structure Core.

The requirements on the camera are:

For depth estimation and live feature detection, a CUDA-capable graphics card is required.

The application has been tested on Ubuntu Linux only.

About

This repository contains the Camera calibration application and the library it is based on, libvis. The library is work-in-progress and it is not recommended to use it for other projects at this point.

The application and library code is licensed under the BSD license, but please also notice the licenses of the included or externally used third-party components.

If you use the provided code for research, please cite the paper describing the approach:

Thomas Schöps, Viktor Larsson, Marc Pollefeys, Torsten Sattler, "Why Having 10,000 Parameters in Your Camera Model is Better Than Twelve", arXiv 2019.

Building

Building has been tested on Ubuntu 14.04 and Ubuntu 18.04 (with gcc).

The following external dependencies are required.

DependencyVersion(s) known to work
Boost1.54.0
CUDA10.1
Eigen3.3.7
GLEW1.10.0
OpenGVCommit 306a54e6c6b94e2048f820cdf77ef5281d4b48ad
Qt5.12.0; minimum version: 5.8
SuiteSparse4.2.1
zlib-

The following external dependencies are optional.

DependencyPurpose
librealsense2Live input from RealSense D400 series depth cameras (tested with the D435 only).
Structure SDKLive input from Structure Core cameras (tested with the color version only). To use this, set the SCSDK_ROOT CMake variable to the SDK path.

After obtaining all dependencies, the application can be built with CMake, for example as follows:

mkdir build
cd build
cmake -DCMAKE_BUILD_TYPE=RelWithDebInfo -DCMAKE_CUDA_FLAGS="-arch=sm_61" ..
make -j camera_calibration  # Reduce the number of threads if running out of memory, e.g., -j3

If you intend to use the depth estimation or live feature detection functionalities, make sure to specify suitable CUDA architecture(s) in CMAKE_CUDA_FLAGS. Common settings would either be the CUDA architecture of your graphics card only (in case you only intend to run the compiled application on the system it was compiled on), or a range of virtual architectures (in case the compiled application is intended for distribution). See the corresponding CUDA documentation.

How to use

Obtaining a calibration pattern

This is a prerequisite for calibration.

The first step is to choose a suitable pattern. Ideally, the density of features on the pattern is chosen to be appropriate for the resolution of the camera to be calibrated. For example, a high-resolution camera can observe many features at the same time, so a high feature density helps in quickly obtaining enough calibration data. However, this pattern may not be well-suited for a low-resolution camera, which cannot sharply observe all features at the same time. It should also be considered that high numbers of features (either due to high density, or due to using multiple patterns at the same time) significantly increase the time required to perform the calibration.

Some readily usable patterns with different feature densities, generated for DIN A4 sized paper, are included in the patterns folder. Each pattern consists of a PDF file for display and a YAML file that describes the pattern content. The YAML file later needs to be passed to the camera calibration program such that it can detect the corresponding pattern.

If the provided patterns are not sufficient, you can generate additional patterns with the pattern generation script scripts/create_calibration_pattern.py. The script uses ReportLab to generate the PDF file, which may be installed like: sudo pip[3] install reportlab. It also depends on numpy. Call the script as follows to see its usage: python[3] create_calibration_pattern.py -h. Only the --tag36h11_path and --output_base_path arguments are mandatory.

After deciding for one or multiple patterns, the second step is to choose how to present the pattern(s) to the camera:

Calibrating a camera with live input

Live input has the advantage that the coverage of the camera view with feature detections is shown in real-time during recording, showing where additional data is still needed. However, this is only possible for cameras for which live support has been implemented. Currently, there is support for Intel RealSense cameras via librealsense2, for Occipital Structure Core cameras via the Structure SDK, and for many other kinds of cameras with video4linux2.

To use this mode of operation, start the application without arguments:

/path/to/camera_calibration/build/applications/camera_calibration/camera_calibration

This will show a window that might look like this with a webcam and an Intel RealSense D435 camera attached:

Settings Window

At the top, all attached and detected cameras are listed. They are prefixed by the library that they are detected with. A single camera may be detected by multiple libraries; for example, here the three cameras on the D435 device were detected by librealsense and by video4linux2 (but in this case, they will only work with librealsense).

In this list, check the boxes for all cameras that should be used at the same time. Note that at present, it is only possible to check multiple "librealsense" cameras or multiple "Structure SDK" cameras at the same time, but no other cameras or cameras used with different libraries.

The "Live feature detection" box should remain checked to give a live image of the image coverage with feature detections. It should be unchecked if no CUDA-capable graphics card is available, or if recording data for other purposes.

In the text field above this box, the paths to the pattern YAML files that will be used must be entered. If the mode which shows the pattern on screen will be used later, this pattern must also be selected here.

The feature window extent should be set to suit the specific camera(s) used. It is recommended to shortly try out a few different values and choose the value which gives the most reliable feature detections. Common values are for example 10, 15, and 20.

Saving the recorded images is helpful in case you cannot run real-time feature detection, or if you potentially want to process the images again later with other settings. If you do not want to save the images, the corresponding checkbox can be un-ticked.

For saving the recorded images, and a dataset file containing the features extracted in real-time, specify a directory to save the dataset and images in at the bottom.

From here on, there are two ways to start live operation:

To end recording, simply close the recording window (use Escape or Alt+F4 in case of the fullscreen pattern display).

Recording with live feature detection yields a file dataset.bin that can be further processed to calibrate the camera as described in the second step of the section below. If only recording images, proceed as described from the start of the section below.

Calibrating a camera from images in a folder

This mode of operation may be used for cameras for which live input is not possible, or after recording images live as described above.

Feature extraction

To extract features and create a dataset file, the camera calibration program can be first called as follows, for example. This assumes that the images have been placed in a folder ${DATASET}/images.

export CALIBRATION_PATH=/path/to/camera_calibration_root_folder
export DATASET=/path/to/dataset_folder
export HALF_WINDOW_SIZE=15  # Adjust to what gives the most detections for your camera, e.g., 10, 15, or 20
${CALIBRATION_PATH}/build/applications/camera_calibration/camera_calibration \
    --pattern_files ${CALIBRATION_PATH}/applications/camera_calibration/patterns/pattern_resolution_17x24_segments_16_apriltag_0.yaml \
    --image_directories ${DATASET}/images \
    --dataset_output_path ${DATASET}/features_${HALF_WINDOW_SIZE}px.bin \
    --refinement_window_half_extent ${HALF_WINDOW_SIZE} \
    --show_visualizations  # optional for showing visualizations
#   --no_cuda_feature_detection  # use this to disable using CUDA for feature detection

--pattern_files must be a comma-separated list of paths to YAML files describing the calibration pattern(s) used. --image_directories specifies the path to the directory containing the images. If calibrating a camera rig, multiple comma-separated folders must be given. Images in different folders that have the same file name are assumed to be recorded at the same time. --dataset_output_path gives the path to a file that will be created to store the extracted features. If you use --show_visualizations, the visualization window will remain open once the process has finished and needs to be closed manually.

Camera calibration

As a second step, the camera calibration program can be called to perform the actual calibration based on the extracted features, for example as follows (using the definitions from above):

export CELL_SIZE=50  # Choose a suitable value for the camera's resolution
${CALIBRATION_PATH}/build/applications/camera_calibration/camera_calibration \
    --dataset_files ${DATASET}/features_${HALF_WINDOW_SIZE}px.bin \
    --output_directory ${DATASET}/result_${HALF_WINDOW_SIZE}px_noncentral_generic_${CELL_SIZE} \
    --cell_length_in_pixels ${CELL_SIZE} \
    --model noncentral_generic \
    --num_pyramid_levels 4 \
    --show_visualizations  # optional for showing visualizations

--dataset_files must point to the dataset file with the extracted features. The computed calibration files will be saved in the folder given with --output_directory. --cell_length_in_pixels specifies the desired cell length for generic camera models; see below. The camera model to use must be given with --model. For generic camera models, it can be helpful to use a multi-resolution pyramid during calibration for better convergence. The number of pyramid levels can be given with --num_pyramid_levels. Note that re-sampling for the noncentral_generic model is implemented in a somewhat inaccurate way, however. If you use --show_visualizations, the visualization window will remain open once the process has finished and needs to be closed manually.

The available camera models are as follows. See the corresponding section below for recommendations on which model to choose.

central_generic
central_thin_prism_fisheye
central_opencv
central_radial
noncentral_generic

For generic camera models, a grid resolution respectively cell size must be chosen. Calibrated 3D observation directions or lines are stored at the corners of the resulting grid and are interpolated over the grid cells. Note that the given cell size is not used directly; rather, the closest cell size is chosen that yields an integer number of cells over the calibrated image area.

The grid resolution should be chosen to be appropriate for the camera's resolution. For example, for a camera of resolution 2000x1000 pixels, a cell length of 40 might be appropriate, while for a camera of resolution 640x480 pixels, a cell length of 10 might be appropriate. The points to consider are:

The output files contain some "report" files that allow to judge the quality of the resulting calibration. See the section "How to obtain good calibration results" below.

Refining existing calibrations

It is also possible to take an existing calibration and refine it, possibly after re-sampling to a different camera model. To do this, run the calibration program as specified above, but also give the directory in which the existing calibration is saved in with the --state_directory parameter. Note that re-sampling camera models is only implemented between different central models, from a central model to the non-central model, and (approximately) from the non-central model to a different grid resolution, but not from the non-central model to a central model. For example, for near-central cameras, this allows to calibrate the camera with a central model first and then use the non-central model as last refinement step.

Handling large datasets and many variables

The application computes the Schur complement during bundle adjustment while solving for state updates. By default, it will fully store the off-diagonal part of the Hessian matrix in memory for its computation, which may become huge if there are many images and thus many pose variables to be optimized, as well as many intrinsics variables to be optimized. This may be very slow and/or exceed the available memory. To better handle such cases, the program allows to change this behavior by specifying the --schur_mode parameter. It supports the following options:

You may need to try out which option works best for your case. If you do not run into any issues with memory or performance, you may simply leave this option at its default.

Calibrating a stereo camera and computing depth images

This requires a fixed configuration of two cameras whose fields of view overlap. For example, this is well-suited to calibrate active stereo cameras such as the Intel D435 or the Occipital Structure Core. However, it is also possible to put two arbitrary individual cameras next to each other to make a stereo rig. Note that this configuration needs to remain completely fixed though for the calibration to remain valid, and both cameras are supposed to take images at exactly the same time; alternatively, the scene must be static, such that different recording times do not matter.

Also note that at the moment, this supports only a single camera model at a time, depending on which model the CUDA kernel for stereo depth estimation is compiled with. See libvis/src/libvis/cuda/pixel_corner_projector.cuh. By default, it is the central-generic camera model.

Another limitation of the implementation (that should be trivial to fix if required) is that the calibration must have been made with exactly the two cameras that will be used for stereo depth estimation (and no additional ones).

If using an active stereo camera, the active projection should be disabled for calibration. The librealsense integration can do this if using a RealSense camera for live input. For other cameras, the projector needs to be covered to block the light.

Calibration otherwise works as described in the sections above, either with live camera input or based on recorded images.

For depth estimation, stereo images with the active projection turned on should be recorded. Depth maps can then be computed for example as follows:

export CALIBRATION_PATH=/path/to/camera_calibration
export CALIBRATION_RESULT=/path/to/calibration/result/folder
export STEREO_DATASET=/path/to/input/image/dataset
export IMAGE=image_filename_without_png
${CALIBRATION_PATH}/build/applications/camera_calibration/camera_calibration \
    --stereo_depth_estimation \
    --state_directory ${CALIBRATION_RESULT} \
    --images ${STEREO_DATASET}/images0/${IMAGE}.png,${STEREO_DATASET}/images1/${IMAGE}.png \
    --output_directory ${STEREO_DATASET}/stereo_${IMAGE}

This assumes that the stereo images have been recorded with the camera_calibration program, which places the images of the two cameras in the images0 and images1 folders.

Note that the stereo depth estimation implementation has not at all been optimized and may thus take a very long time to compute.

Which camera model to choose?

For best results, choose one of the following models:

Usually, noncentral_generic is slightly more accurate than central_generic, even for near-central cameras. In general, it should always be at least as accurate as central_generic, unless a lack of data leads to overfitting.

However, one should be aware of the implications: With a non-central camera model, images in general cannot be undistorted to pinhole images (without knowing the scene geometry), and algorithms developed for central cameras might require adaptation. For this reason, using a central camera model might be more convenient, even if being a little less accurate.

How to obtain and verify good calibration results

Some tips to follow for getting good calibration results are:

After computing a calibration, the report files within the output directory allow judging the calibration quality.

How to use generic camera models in your application

After successful calibration, the calibrated intrinsic camera parameters are stored in the files intrinsicsX.yaml in the output folder.

In the applications/camera_calibration/generic_models folder, there are implementations for the central-generic and non-central generic camera models which can load these intrinsics YAML files. This should make it easy to use these camera models in other applications. These implementations support 3D point projection to the image, pixel un-projection to a 3D direction respectively line, and computing Jacobians for the above operations with respect to the input point or pixel.

These camera model implementations use the Eigen library as a single dependency. Even this dependency should be easy to remove if desired, since only its matrix and vector classes are used, but no advanced functionality that would be hard to substitute. See the main file of this implementation for some unit tests, which show by example how to use the camera model classes. The camera models are also documented with Doxygen comments. However, note that these implementations have not been optimized; depending on the application, it could be sensible to use different kinds of lookup tables to speed up the operations.

Note that the calibration program will not calibrate the whole image area, but only the bounding rectangle of all feature detections. Due to the local window size for feature refinement, features are not detected directly next to the image borders. If it was crucial to calibrate the whole image area, it would for example be possible to extrapolate the calibration, or to tolerate some overlap of the feature refinement window with regions outside of the image.

Reference on calibration report visualizations