Home

Awesome

Learning 3D Representations from 2D Pre-trained Models via Image-to-Point Masked Autoencoders

PWC PWC

Official implementation of 'Learning 3D Representations from 2D Pre-trained Models via Image-to-Point Masked Autoencoders'.

The paper has been accepted by CVPR 2023 🔥.

News

Introduction

Comparison with existing MAE-based 3D models on the three spilts of ScanObjectNN:

MethodParametersGFlopsExtra DataOBJ-BGOBJ-ONLYPB-T50-RS
Point-BERT22.1M4.8-87.43%88.12%83.07 %
ACT22.1M4.82D92.48%91.57%87.88%
Point-MAE22.1M4.8-90.02%88.29%85.18%
Point-M2AE12.9M3.6-91.22%88.81%86.43%
I2P-MAE12.9M3.62D94.15%91.57%90.11%

We propose an alternative to obtain superior 3D representations from 2D pre-trained models via Image-to-Point Masked Autoencoders, named as I2P-MAE. By self-supervised pre-training, we leverage the well learned 2D knowledge to guide 3D masked autoencoding, which reconstructs the masked point tokens with an encoder-decoder architecture. Specifically, we conduct two types of image-to-point learning schemes: 2D-guided masking and 2D-semantic reconstruction. In this way, the 3D network can effectively inherit high-level 2D semantics learned from rich image data for discriminative 3D modeling.

<div align="center"> <img src="pipeline.png"/> </div>

I2P-MAE Models

Pre-training

Guided by pre-trained CLIP on ShapeNet, I2P-MAE is evaluated by Linear SVM on ModelNet40 and ScanObjectNN (OBJ-BG split) datasets, without downstream fine-tuning:

TaskDatasetConfigMN40 Acc.OBJ-BG Acc.CkptsLogs
Pre-trainingShapeNeti2p-mae.yaml93.35%87.09%pre-train.pthlog

Fine-tuning

Synthetic shape classification on ModelNet40 with 1k points:

TaskConfigAcc.VoteCkptsLogs
Classificationmodelnet40.yaml93.67%94.06%modelnet40.pthmodelnet40.log

Real-world shape classification on ScanObjectNN:

TaskSplitConfigAcc.CkptsLogs
ClassificationPB-T50-RSscan_pb.yaml90.11%scan_pd.pthscan_pd.log
ClassificationOBJ-BGscan_obj-bg.yaml94.15%--
ClassificationOBJ-ONLYscan_obj.yaml91.57%--

Requirements

Installation

Create a conda environment and install basic dependencies:

git clone https://github.com/ZrrSkywalker/I2P-MAE.git
cd I2P-MAE

conda create -n i2pmae python=3.7
conda activate i2pmae

# Install the according versions of torch and torchvision
conda install pytorch torchvision cudatoolkit
# e.g., conda install pytorch==1.11.0 torchvision==0.12.0 torchaudio==0.11.0 cudatoolkit=11.3

pip install -r requirements.txt
pip install torch-scatter -f https://data.pyg.org/whl/torch-1.11.0+cu113.html

Install GPU-related packages:

# Chamfer Distance and EMD
cd ./extensions/chamfer_dist
python setup.py install --user
cd ../emd
python setup.py install --user

# PointNet++
pip install "git+https://github.com/erikwijmans/Pointnet2_PyTorch.git#egg=pointnet2_ops&subdirectory=pointnet2_ops_lib"

# GPU kNN
pip install --upgrade https://github.com/unlimblue/KNN_CUDA/releases/download/0.2/KNN_CUDA-0.2-py3-none-any.whl

Datasets

For pre-training and fine-tuning, please follow DATASET.md to install ShapeNet, ModelNet40, ScanObjectNN, and ShapeNetPart datasets, referring to Point-BERT. Specially for Linear SVM evaluation, download the official ModelNet40 dataset and put the unzip folder under data/.

The final directory structure should be:

│I2P-MAE/
├──cfgs/
├──datasets/
├──data/
│   ├──ModelNet/
│   ├──ModelNetFewshot/
│   ├──modelnet40_ply_hdf5_2048/  # Specially for Linear SVM
│   ├──ScanObjectNN/
│   ├──ShapeNet55-34/
│   ├──shapenetcore_partanno_segmentation_benchmark_v0_normal/
├──...

Get Started

Pre-training

I2P-MAE is pre-trained on ShapeNet dataset with the config file cfgs/pre-training/i2p-mae.yaml. Run:

CUDA_VISIBLE_DEVICES=0 python main.py --config cfgs/pre-training/i2p-mae.yaml --exp_name pre-train

To evaluate the pre-trained I2P-MAE by Linear SVM, create a folder ckpts/ and download the pre-train.pth into it. Use the configs in cfgs/linear-svm/ and indicate the evaluation dataset by --test_svm.

For ModelNet40, run:

CUDA_VISIBLE_DEVICES=0 python main.py --config cfgs/linear-svm/modelnet40.yaml --test_svm modelnet40 --exp_name test_svm --ckpts ./ckpts/pre-train.pth

For ScanObjectNN (OBJ-BG split), run:

CUDA_VISIBLE_DEVICES=0 python main.py --config cfgs/linear-svm/scan_obj-bg.yaml --test_svm scan --exp_name test_svm --ckpts ./ckpts/pre-train.pth

Fine-tuning

Please create a folder ckpts/ and download the pre-train.pth into it. The fine-tuning configs are in cfgs/fine-tuning/.

For ModelNet40, run:

CUDA_VISIBLE_DEVICES=0 python main.py --config cfgs/fine-tuning/modelnet40.yaml --finetune_model --exp_name finetune --ckpts ckpts/pre-train.pth

For the three splits of ScanObjectNN, run:

CUDA_VISIBLE_DEVICES=0 python main.py --config cfgs/fine-tuning/scan_pb.yaml --finetune_model --exp_name finetune --ckpts ckpts/pre-train.pth
CUDA_VISIBLE_DEVICES=0 python main.py --config cfgs/fine-tuning/scan_obj.yaml --finetune_model --exp_name finetune --ckpts ckpts/pre-train.pth
CUDA_VISIBLE_DEVICES=0 python main.py --config cfgs/fine-tuning/scan_obj-bg.yaml --finetune_model --exp_name finetune --ckpts ckpts/pre-train.pth

Acknowledgement

This repo benefits from Point-M2AE, Point-BERT, Point-MAE, and CLIP. Thanks for their wonderful works.

Contact

If you have any question about this project, please feel free to contact zhangrenrui@pjlab.org.cn.