Awesome
pySLAM v2.3.0
Author: Luigi Freda
<!-- TOC -->- pySLAM v2.3.0
pySLAM is a python implementation of a Visual SLAM pipeline that supports monocular, stereo and RGBD cameras. It provides the following features:
- A wide range of classical and modern local features with a convenient interface for their integration.
- Various loop closing methods, including descriptor aggregators such as visual Bag of Words (BoW, iBow), Vector of Locally Aggregated Descriptors (VLAD) and modern global descriptors (image-wise descriptors).
- A volumetric reconstruction pipeline that processes available depth and color images.
- Integration of depth prediction models within the SLAM pipeline. These include DepthPro, DepthAnythingV2, RAFT-Stereo, CREStereo, etc.
- A collection of other useful tools for VO and SLAM.
Main Scripts
-
main_vo.py
combines the simplest VO ingredients without performing any image point triangulation or windowed bundle adjustment. At each step $k$,main_vo.py
estimates the current camera pose $C_k$ with respect to the previous one $C_{k-1}$. The inter-frame pose estimation returns $[R_{k-1,k},t_{k-1,k}]$ with $\Vert t_{k-1,k} \Vert=1$. With this very basic approach, you need to use a ground truth in order to recover a correct inter-frame scale $s$ and estimate a valid trajectory by composing $C_k = C_{k-1} [R_{k-1,k}, s t_{k-1,k}]$. This script is a first start to understand the basics of inter-frame feature tracking and camera pose estimation. -
main_slam.py
adds feature tracking along multiple frames, point triangulation, keyframe management, bundle adjustment, loop closing, dense mapping and depth inference in order to estimate the camera trajectory and build both a sparse and dense map. It's a full SLAM pipeline and includes all the basic and advanced blocks which are necessary to develop a real visual SLAM pipeline. -
main_feature_matching.py
shows how to use the basic feature tracker capabilities (feature detector + feature descriptor + feature matcher) and allows to test the different available local features. -
main_depth_prediction.py
shows how to use the available depth inference models to get depth estimations from input color images. -
main_map_viewer.py
allows to reload a saved map and visualize it. Further details here.
System overview
Here you can find a couple of diagram sketches that provide an overview of the main system components, and classes relationships and dependencies.
You can use the pySLAM framework as a baseline to experiment with VO techniques, local features, descriptor aggregators, global descriptors, volumetric integration, depth prediction, and create your own (proof of concept) VO/SLAM pipeline in python. When working with it, please keep in mind this is a research framework written in Python and a work in progress. It is not designed for real-time performances.
Enjoy it!
<p align="center" style="margin:0"> <img src="images/STEREO.png" alt="Visual Odometry" height="160" border="0" /> <img src="images/feature-matching.png" alt="Feature Matching" height="160" border="0" /> <img src="images/RGBD2.png" alt="SLAM" height="160" border="0" /> <img src="images/main-rerun-vo-and-matching.png" alt="Feature matching and Visual Odometry" height="160" border="0" /> <img src="images/loop-detection2.png" alt="Loop detection" height="160" border="0" /> <img src="images/kitti-stereo.png" alt="Stereo SLAM" height="160" border="0" /> <img src="images/dense-reconstruction2.png" alt="Dense Reconstruction" height="160" border="0" /> <img src="images/depth-prediction.png" alt="Depth Prediction" height="160" border="0" /> <img src="images/dense-reconstruction-with-depth-prediction.png" alt="Dense Reconstruction with Depth Prediction" height="160" border="0" /> </p>Install
First, clone this repo and its modules by running
$ git clone --recursive https://github.com/luigifreda/pyslam.git
$ cd pyslam
Then, use the available specific install procedure according to your OS. The provided scripts will create a single python environment that is able to host all the supported components and models!
Main requirements
- Python 3.8.10
- OpenCV >=4.10 (see below)
- PyTorch 2.3.1
- Tensorflow 2.13.1
- Kornia 0.7.3
- Rerun
If you encounter any issues or performance problems, refer to the TROUBLESHOOTING file for assistance.
Ubuntu
Follow the instructions reported here for creating a new virtual environment pyslam
with venv. The procedure has been tested on Ubuntu 18.04, 20.04, 22.04 and 24.04.
If you prefer conda, run the scripts described in this other file.
MacOS
Follow the instructions in this file. The reported procedure was tested under Sequoia 15.1.1 and Xcode 16.1.
Docker
If you prefer docker or you have an OS that is not supported yet, you can use rosdocker:
- with its custom
pyslam
/pyslam_cuda
docker files and follow the instructions here. - with one of the suggested docker images (ubuntu*_cuda or ubuntu*), where you can build and run pyslam.
How to install non-free OpenCV modules
The provided install scripts will install a recent opencv version (>=4.10) with non-free modules enabled (see the provided scripts install_pip3_packages.sh and install_opencv_python.sh). To quickly verify your installed opencv version run:
$ . pyenv-activate.sh
$ ./scripts/opencv_check.py
or use the following command:
$ python3 -c "import cv2; print(cv2.__version__)"
How to check if you have non-free OpenCV module support (no errors imply success):
$ python3 -c "import cv2; detector = cv2.xfeatures2d.SURF_create()"
Troubleshooting and performance issues
If you run into issues or errors during the installation process or at run-time, please, check the docs/TROUBLESHOOTING.md file.
Usage
Once you have run the script install_all_venv.sh
(follow the instructions above according to your OS), you can open a new terminal and run:
$ . pyenv-activate.sh # Activate pyslam python virtual environment. This is only needed once in a new terminal.
$ ./main_vo.py
This will process a default KITTI video (available in the folder data/videos
) by using its corresponding camera calibration file (available in the folder settings
), and its groundtruth (available in the same data/videos
folder). If matplotlib windows are used, you can stop main_vo.py
by focusing/clicking on one of them and pressing the key 'Q'.
Note: As explained above, the basic script main_vo.py
strictly requires a ground truth.
In order to process a different dataset, you need to set the file config.yaml
:
- Select your dataset
type
in the sectionDATASET
(further details in the section Datasets below for further details). This identifies a corresponding dataset section (e.g.KITTI_DATASET
,TUM_DATASET
, etc). - Select the
sensor_type
(mono
,stereo
,rgbd
) in the chosen dataset section. - Select the camera
settings
file in the dataset section (further details in the section Camera Settings below). - The
groudtruth_file
accordingly (further details in the section Datasets below and check the filesio/ground_truth.py
andio/convert_groundtruth.py
).
Similarly, you can test main_slam.py
by running:
$ . pyenv-activate.sh # Activate pyslam python virtual environment. This is only needed once in a new terminal.
$ ./main_slam.py
This will process a default KITTI video (available in the folder data/videos
) by using its corresponding camera calibration file (available in the folder settings
). You can stop it by focusing/clicking on one of the opened matplotlib windows and pressing the key 'Q'.
Note: Due to information loss in video compression, main_slam.py
tracking may peform worse with the available KITTI videos than with the original KITTI image sequences. The available videos are intended to be used for a first quick test. Please, download and use the original KITTI image sequences as explained below.
Feature tracking
If you just want to test the basic feature tracking capabilities (feature detector + feature descriptor + feature matcher) and get a taste of the different available local features, run
$ . pyenv-activate.sh # Activate pyslam python virtual environment. This is only needed once in a new terminal.
$ ./main_feature_matching.py
In any of the above scripts, you can choose any detector/descriptor among ORB, SIFT, SURF, BRISK, AKAZE, SuperPoint, etc. (see the section Supported Local Features below for further information).
Some basic examples are available in the subfolder test/loopclosing
. In particular, as for feature detection/description, you may want to take a look at test/cv/test_feature_manager.py too.
Loop closing
Different loop closing methods are available, combining aggregation methods and global descriptors.
Loop closing is enabled by default and can be disabled by setting kUseLoopClosing=False
in config_parameters.py
. Configuration options can be found in loop_closing/loop_detector_configs.py.
Examples: Start with the examples in test/loopclosing
, such as test/loopclosing/test_loop_detector.py.
Vocabulary management
DBoW2
, DBoW3
, and VLAD
require pre-trained vocabularies. The first step is to generate an array of descriptors from a set of reference images. Then, a vocabulary can be trained on it.
-
Generate descriptors array: Use
test/loopclosing/test_gen_des_array_from_imgs.py
to generate the array of descriptors for training a vocabulary. Select your desired descriptor type via the tracker configuration. -
DBOW vocabulary generation: Train your target vocabulary by using the script
test/loopclosing/test_gen_dbow_voc_from_des_array.py
. -
VLAD vocabulary generation: Train your target VLAD "vocabulary" by using the script
test/loopclosing/test_gen_vlad_voc_from_des_array.py
.
Vocabulary-free loop closing
Most methods do not require pre-trained vocabularies. Specifically:
iBoW
andOBindex2
: These methods incrementally build bags of binary words and, if needed, convert (front-end) non-binary descriptors into binary ones.- Others: Methods like
HDC_DELF
,SAD
,AlexNet
,NetVLAD
,CosPlace
, andEigenPlaces
directly extract global descriptors and process them using dedicated aggregators, independently from the used front-end descriptors.
As mentioned above, only DBoW2
, DBoW3
, and VLAD
require pre-trained vocabularies.
Volumetric reconstruction pipeline
The volumetric reconstruction pipeline is disabled by default. You can enable it by setting kUseVolumetricIntegration=True
in config_parameters.py
. This runs in the back-end. At present, it works with:
- RGBD datasets
- When a depth estimator is used in the back-end or front-end and a depth prediction/estimation gets available for each processed keyframe.
If you want a mesh as output set kVolumetricIntegrationExtractMesh=True
in config_parameters.py
.
Depth prediction
The available depth prediction models can be utilized both in the SLAM back-end and front-end.
- Back-end: Depth prediction can be enabled in the volumetric reconstruction pipeline by setting the parameter
kVolumetricIntegrationUseDepthEstimator=True
and selecting your preferredkVolumetricIntegrationDepthEstimatorType
inconfig_parameters.py
. - Front-end: Depth prediction can be enabled in the front-end by setting the parameter
kUseDepthEstimatorInFrontEnd
inconfig_parameters.py
. This feature estimates depth images from input color images to emulate a RGBD camera. Please, note this functionality is still experimental at present time [WIP].
Refer to the file depth_estimation/depth_estimator_factory.py
for further details. Both stereo and monocular prediction approaches are supported. You can test depth prediction/estimation by using the script main_depth_prediction.py
.
Notes:
- In the case of a monocular SLAM configuration, do NOT use depth prediction in the back-end volumetric integration: The SLAM (fake) scale will conflict with the absolute metric scale of depth predictions. With monocular datasets, enable depth prediction to run in the front-end.
- The depth inference may be very slow (for instance, with DepthPro it takes ~1s per image on my machine). Therefore, the resulting volumetric reconstruction pipeline may be very slow.
Save and reload a map
When you run the script main_slam.py
:
- The current map can be saved into the file
data/slam_state/map.json
by pressing the buttonSave
on the GUI. - The saved map can be reloaded and visualized into the GUI by running:
$ . pyenv-activate.sh # Activate pyslam python virtual environment. This is only needed once in a new terminal. $ ./main_map_viewer.py
Relocalization in a loaded map
To enable map reloading and relocalization in it, open config.yaml
and set
SYSTEM_STATE:
load_state: True # flag to enable SLAM state reloading (map state + loop closing state)
folder_path: data/slam_state # folder path relative to root of this repository
Pressing the Save
button saves the current map, front-end, and backend configurations. Reloading a saved map overwrites the current system configurations to ensure descriptor compatibility.
Trajectory saving
Estimated trajectories can be saved in three different formats: TUM (The Open Mapping format), KITTI (KITTI Odometry format), and EuRoC (EuRoC MAV format). To enable trajectory saving, open config.yaml
and search for the SAVE_TRAJECTORY
: set save_trajectory: True
, select your format_type
(tum
, kitti
, euroc
), and the output filename. For instance for a tum
format output:
SAVE_TRAJECTORY:
save_trajectory: True
format_type: tum
filename: kitti00_trajectory.txt
SLAM GUI
Some quick information about the non-trivial GUI buttons of main_slam.py
:
Step
: Enter in the Step by step mode. Press the buttonStep
a first time to pause. Then, press it again to make the pipeline process a single new frame.Save
: Save the map into the filemap.json
. You can visualize it back by using the script/main_map_viewer.py
(as explained above).Reset
: Reset SLAM system.Draw Grount Truth
: If a ground truth dataset is loaded (e.g., from KITTI, TUM, EUROC, or REPLICA), you can visualize it by pressing this button. The ground truth trajectory will be displayed in 3D and progressively aligned (approximately every 30 frames) with the estimated trajectory. The alignment improves as more samples are added to the estimated trajectory. After ~20 frames, if the button is pressed, a window will appear showing the Cartesian alignment errors (ground truth vs. estimated trajectory) along the axes.
Monitor the logs for tracking, local mapping, and loop closing simultaneously
The logs generated by the modules local_mapping.py
, loop_closing.py
, loop_detecting_process.py
, and global_bundle_adjustments.py
are collected in the files local_mapping.log
, loop_closing.log
, loop_detecting.log
, and gba.log
, which are all stored in the folder logs
. For fun/debugging, you can monitor each parallel flow by running the following command in a separate shell:
$ tail -f logs/<log file name>
Otherwise, just run the script:
$ ./scripts/launch_tmux_slam.sh
from the repo root folder. Press CTRL+A
and then CTRL+Q
to exit from tmux
environment.
System overview
Here you can find a couple of diagram sketches that provide an overview of the main system components, and classes relationships and dependencies. Writing a proper documentation is a work in progress.
Supported components and models
Supported local features
At present time, the following feature detectors are supported:
- FAST
- Good features to track
- ORB
- ORB2 (improvements of ORB-SLAM2 to ORB detector)
- SIFT
- SURF
- KAZE
- AKAZE
- BRISK
- AGAST
- MSER
- StarDector/CenSurE
- Harris-Laplace
- SuperPoint
- D2-Net
- DELF
- Contextdesc
- LFNet
- R2D2
- Key.Net
- DISK
- ALIKED
- Xfeat
- KeyNetAffNetHardNet (KeyNet detector + AffNet + HardNet descriptor).
The following feature descriptors are supported:
- ORB
- SIFT
- ROOT SIFT
- SURF
- AKAZE
- BRISK
- FREAK
- SuperPoint
- Tfeat
- BOOST_DESC
- DAISY
- LATCH
- LUCID
- VGG
- Hardnet
- GeoDesc
- SOSNet
- L2Net
- Log-polar descriptor
- D2-Net
- DELF
- Contextdesc
- LFNet
- R2D2
- BEBLID
- DISK
- ALIKED
- Xfeat
- KeyNetAffNetHardNet (KeyNet detector + AffNet + HardNet descriptor).
For more information, refer to local_features/feature_types.py file. Some of the local features consist of a joint detector-descriptor. You can start playing with the supported local features by taking a look at test/cv/test_feature_manager.py
and main_feature_matching.py
.
In both the scripts main_vo.py
and main_slam.py
, you can create your preferred detector-descritor configuration and feed it to the function feature_tracker_factory()
. Some ready-to-use configurations are already available in the file local_features/feature_tracker.configs.py
The function feature_tracker_factory()
can be found in the file local_features/feature_tracker.py
. Take a look at the file local_features/feature_manager.py
for further details.
N.B.: You just need a single python environment to be able to work with all the supported local features!
Supported matchers
See the file local_features/feature_matcher.py
for further details.
Supported global descriptors and local descriptor aggregation methods
Local descriptor aggregation methods
- Bag of Words (BoW): DBoW2, DBoW3. [paper]
- Vector of Locally Aggregated Descriptors: VLAD. [paper]
- Incremental Bags of Binary Words (iBoW) via Online Binary Image Index: iBoW, OBIndex2. [paper]
- Hyperdimensional Computing: HDC. [paper]
NOTE: iBoW and OBIndex2 incrementally build a binary image index and do not need a prebuilt vocabulary. In the implemented classes, when needed, the input non-binary local descriptors are transparently transformed into binary descriptors.
Global descriptors
Also referred to as holistic descriptors:
Different loop closing methods are available. These combines the above aggregation methods and global descriptors. See the file loop_closing/loop_detector_configs.py for further details.
Supported depth prediction models
Both monocular and stereo depth prediction models are available. SGBM algorithm has been included as a classic reference approach.
- SGBM: Depth SGBM from OpenCV (Stereo, classic approach)
- Depth-Pro (Monocular)
- DepthAnythingV2 (Monocular)
- RAFT-Stereo (Stereo)
- CREStereo (Stereo)
Datasets
Five different types of datasets are available:
Dataset | type in config.yaml |
---|---|
KITTI odometry data set (grayscale, 22 GB) | type: KITTI_DATASET |
TUM dataset | type: TUM_DATASET |
EUROC dataset | type: EUROC_DATASET |
Video file | type: VIDEO_DATASET |
Folder of images | type: FOLDER_DATASET |
Use the download scripts available in the folder scripts
to download some of the following datasets.
KITTI Datasets
pySLAM code expects the following structure in the specified KITTI path folder (specified in the section KITTI_DATASET
of the file config.yaml
). :
├── sequences
├── 00
...
├── 21
├── poses
├── 00.txt
...
├── 10.txt
-
Download the dataset (grayscale images) from http://www.cvlibs.net/datasets/kitti/eval_odometry.php and prepare the KITTI folder as specified above
-
Select the corresponding calibration settings file (section
KITTI_DATASET: cam_settings:
in the fileconfig.yaml
)
TUM Datasets
pySLAM code expects a file associations.txt
in each TUM dataset folder (specified in the section TUM_DATASET:
of the file config.yaml
).
- Download a sequence from http://vision.in.tum.de/data/datasets/rgbd-dataset/download and uncompress it.
- Associate RGB images and depth images using the python script associate.py. You can generate your
associations.txt
file by executing:
$ python associate.py PATH_TO_SEQUENCE/rgb.txt PATH_TO_SEQUENCE/depth.txt > associations.txt
- Select the corresponding calibration settings file (section
TUM_DATASET: cam_settings:
in the fileconfig.yaml
).
EuRoC Datasets
- Download a sequence (ASL format) from http://projects.asl.ethz.ch/datasets/doku.php?id=kmavvisualinertialdatasets (check this direct link)
- Use the script
io/generate_euroc_groundtruths_as_tum.sh
to generate the TUM-like groundtruth filespath + '/' + name + '/mav0/state_groundtruth_estimate0/data.tum'
that are required by theEurocGroundTruth
class. - Select the corresponding calibration settings file (section
EUROC_DATASET: cam_settings:
in the fileconfig.yaml
).
Replica Datasets
- You can download the zip file containing all the sequences by running:
$ wget https://cvg-data.inf.ethz.ch/nice-slam/data/Replica.zip
- Then, uncompress it and deploy the files as you wish.
- Select the corresponding calibration settings file (section
REPLICA_DATASET: cam_settings:
in the fileconfig.yaml
).
Camera Settings
The folder settings
contains the camera settings files which can be used for testing the code. These are the same used in the framework ORB-SLAM2. You can easily modify one of those files for creating your own new calibration file (for your new datasets).
In order to calibrate your camera, you can use the scripts in the folder calibration
. In particular:
- Use the script
grab_chessboard_images.py
to collect a sequence of images where the chessboard can be detected (set the chessboard size therein, you can use the calibration patterncalib_pattern.pdf
in the same folder) - Use the script
calibrate.py
to process the collected images and compute the calibration parameters (set the chessboard size therein)
For more information on the calibration process, see this tutorial or this other link.
If you want to use your camera, you have to:
- Calibrate it and configure WEBCAM.yaml accordingly
- Record a video (for instance, by using
save_video.py
in the foldercalibration
) - Configure the
VIDEO_DATASET
section ofconfig.yaml
in order to point to your recorded video.
Comparison pySLAM vs ORB-SLAM3
For a comparative evaluation of the trajectories estimated by pySLAM and by ORB-SLAM3, see this trajectory comparison notebook.
Note that pySLAM saves its pose estimates in an online fashion: At each frame, the current pose estimate is saved at the end of the front-end tracking
iteration. On the other end, ORB-SLAM3 pose estimates are saved at the end of the full dataset playback: That means each pose estimate $q$ of ORB-SLAM is refined multiple times by LBA and BA over the multiple window optimizations that cover $q$.
You can save your pyslam trajectories as detailed here.
Contributing to pySLAM
If you like pySLAM and would like to contribute to the code base, you can report bugs, leave comments and proposing new features through issues and pull requests on github. Feel free to get in touch at luigifreda(at)gmail[dot]com. Thank you!
References
Suggested books:
- Multiple View Geometry in Computer Vision by Richard Hartley and Andrew Zisserman
- An Invitation to 3-D Vision by Yi-Ma, Stefano Soatto, Jana Kosecka, S. Shankar Sastry
- State Estimation for Robotics by Timothy D. Barfoot
- Computer Vision: Algorithms and Applications, by Richard Szeliski
- Introduction to Visual SLAM by Xiang Gao, Tao Zhang
- Deep Learning, by Ian Goodfellow, Yoshua Bengio and Aaron Courville
- Neural Networks and Deep Learning, By Michael Nielsen
Suggested material:
- Vision Algorithms for Mobile Robotics by Davide Scaramuzza
- CS 682 Computer Vision by Jana Kosecka
- ORB-SLAM: a Versatile and Accurate Monocular SLAM System by R. Mur-Artal, J. M. M. Montiel, and J. D. Tardos
- Double Window Optimisation for Constant Time Visual SLAM by H. Strasdat, A. J. Davison, J.M.M. Montiel, K. Konolige
- The Role of Wide Baseline Stereo in the Deep Learning World by Dmytro Mishkin
- To Learn or Not to Learn: Visual Localization from Essential Matrices by Qunjie Zhou, Torsten Sattler, Marc Pollefeys, Laura Leal-Taixe
- Awesome local-global descriptors repository
- Introduction to Feature Matching Using Neural Networks
- Visual Place Recognition: A Tutorial
- Bags of Binary Words for Fast Place Recognition in Image Sequences
Moreover, you may want to have a look at the OpenCV guide or tutorials.
Credits
- Pangolin
- g2opy
- ORBSLAM2
- SuperPointPretrainedNetwork
- Tfeat
- Image Matching Benchmark Baselines
- Hardnet
- GeoDesc
- SOSNet
- L2Net
- Log-polar descriptor
- D2-Net
- DELF
- Contextdesc
- LFNet
- R2D2
- BEBLID
- DISK
- Xfeat
- LightGlue
- Key.Net
- Twitchslam
- MonoVO
- VPR_Tutorial
- DepthAnythingV2
- DepthPro
- RAFT-Stereo
- CREStereo and CREStereo-Pytorch
- Many thanks to Anathonic for adding the trajectory-saving feature and for the comparison notebook: pySLAM vs ORB-SLAM3.
TODOs
Many improvements and additional features are currently under development:
- loop closing
- relocalization
- stereo and RGBD support
- map saving/loading
- modern DL matching algorithms
- object detection
- semantic segmentation
- 3D dense reconstruction
- unified install procedure (single branch) for all OSs
- trajectory saving
- depth prediction integration
- ROS support
- gaussian splatting integration
- proper documentation