Awesome
A Closer Look at Invariances in Self-supervised Pre-training for 3D Vision
This repository is for our paper "A Closer Look at Invariances in Self-supervised Pre-training for 3D Vision", Lanxiao Li and Michael Heizmann, ECCV 2022 (Arxiv).
Requirements
Hardware
The training scripts with _ddp
expect distributed training with multiple GPUs. But some samples for single GPU training is also provided.
Software
The repo is tested under Ubuntu 18.04 and 20.04. CUDA-toolkit (tested with 10.2 and 11.1) and GCC is needed to compile some extensions. Also, following python packages are required:
pytorch # tested with 1.8. other versions should work as well
torchvision
open3d
matplotlib
scipy
pybind11
opencv-python
pillow
MinkowskiEngine=0.5.4
To install MinkowskiEngine, please follow the official repo.
Preparation
Extensions
To compile C++-extensions, go to cpp_ext/fps
and cpp/knn
and run
bash build.sh
in each folder.
To compile CUDA-extension (PointNet++), go to model/pointnet2
and run
python setup.py install
Data
To prepare the pre-training data:
-
First you need to download and prepare the ScanNet raw data. Please follow
README.md
inprepare_data
. -
Update the data path in
scannet/config.py
. The data folder should have the following structure:$DATA_FOLDER | |__ scene0000_00 | | | |__ _info.txt # meta data of the scene | |__ frame-000000.color.jpg # color image, resized | |__ frame-000000.png # depth map | |__ frame-000000.pose.txt # camera pose. unused. | |__ frame-000001.color.jpg | |__ frame-000001.png | |__ frame-000001.pose.txt | | ... ... | |__ scene0000_01 |__ scene0000_02 | ... ...
-
You don't need to sample the data. The sampled
frame-IDs
(with the factor 25) are already provided inscannet/sampled_train_25.txt
. The code inscannet/sampler.py
is used for sampling. -
[Optional] If you want to save the sampled data to another place, use
scannet/save_sampled.py
. Remember to updatescannet/config.py
if you want to read data from this new place. -
[Notice] Some extracted depth maps might only contains NaN or O value and are thus invalid. We've found all invalid
frame-Ids
and saved them inscannet/config.py
. These frames are not used for training.
Usage
Pretraining
To pretrain a PointNet++ and a depth map based CNN (DPCo), use
python train_dp_moco_ddp.py \
--lr 0.03 \
--save log/DPCo \
--batch-size 64 \
--cos \
--local \
--moco \
--worker 8 \
--epochs 120 \
--dist-url 'tcp://localhost:10001' \
--multiprocessing-distributed \
--world-size 1 \
--rank 0
Our training is done on a single node with 2 NVIDIA Tesla V100 GPUs. You might have to update some parameters (e.g. workers, batch-size, world-size) according to you own hardware. Also, the code for single GPU without DDP is provided in train_ddp_moco.py
. But we only use this version for debugging purpose.
Similarly, to pretrain a sparse 3D CNN and depth map based CNN (DVCo with color), use
export OMP_NUM_THREADS=12 # make MinkowskiEngine happy
python train_dv_ddp.py \
--lr 0.03 \
--save log/DVCo \
--batch-size 64 \
--cos \
--moco \
--local \
--worker 8 \
--epochs 120 \
--dist-url 'tcp://localhost:10001' \
--multiprocessing-distributed \
--world-size 1 \
--rank 0
Finetuning
For finetuning on 3D object detection task, please follow README.md
in downstream
.
Note
We are still working on cleaning our internal code base and testing with this public repo. There would be updates in the future.
Known Issues
- We encountered OOM problems with MinkowskiEngine. The CPU RAM usage increased constantly with some of our code. Current workaround: Manually pause and resume the training to release the RAM.
- The training might stop or become very slow sometimes, because the Dataset class tries to find more unique local correspondences and get stuck. In this case, try to decrease the ratio of unique matched points or increase the matching threshold
match_thresh
, as commented inscannet/scannet_pretrain.py
.
Citation
If you find this repo helpful, please consider cite our work
@inproceedings{li2022invar3d,
author = {Li, Lanxiao and Heizmann, Michael},
title = {A Closer Look at Invariances in Self-supervised Pre-training for 3D Vision},
booktitle = {ECCV},
year = {2022}
}
Acknowledgement
This repo has modified some code from following repos. We thank the authors for their amazing code bases. Please consider star/cite their works as well.
- ScanNet: https://github.com/ScanNet/ScanNet
- MinkowskiEngine: https://github.com/NVIDIA/MinkowskiEngine
- MoCo: https://github.com/facebookresearch/moco
- VoteNet: https://github.com/facebookresearch/votenet
- DepthContrast: https://github.com/facebookresearch/DepthContrast