Awesome

<div align='center'> <h2><a href="https://arxiv.org/abs/2310.06773">Uni3D: Exploring Unified 3D Representation at Scale</a></h2>

Junsheng Zhou1,2*, Jinsheng Wang1*, Baorui Ma1*, Yu-Shen Liu2, Tiejun Huang1,3, Xinlong Wang1

1BAAI, 2THU, 3PKU * Equal Contribution

ICLR 2024 (Spotlight)

</div> <img src="assets/overview.jpg" alt="overview" width="800" />

We present Uni3D, a unified and scalable 3D pretraining framework for large-scale 3D representation learning, and explore its limits at the scale of one billion parameters. Uni3D uses a 2D initialized ViT end-to-end pretrained to align the 3D point cloud features with the image-text aligned features. Via the simple architecture and pretext task, Uni3D can leverage abundant 2D pretrained models as initialization and image-text aligned models as the target, unlocking the great potential of 2D models and scaling-up strategies to the 3D world. We efficiently scale up Uni3D to one billion parameters, and set new records on a broad range of 3D tasks.

Schedule

We are committed to open-sourcing Uni3D related materials, including:

Extended Uni3D to a 3D metric (Uni3D-score) for enhanced semantic coherence in text-to-3D tasks. For details, see GeoDream.
The weights of models range from 6M to 1B parameters.
Evaluation code
Evaluation data
Pretraining code
Pretraining data

We hope to foster the growth of our community through open-sourcing and promoting collaboration👬. Let's step towards multimodal intelligence together🍻.

Installation

Clone this repository and install the required packages:

git clone https://github.com/baaivision/Uni3D.git
cd Uni3D

conda create -n uni3d python=3.8
conda activate uni3d
conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia

pip install -r requirements.txt

# install pointnet2 extensions from https://github.com/erikwijmans/Pointnet2_PyTorch
pip install "git+git://github.com/erikwijmans/Pointnet2_PyTorch.git#egg=pointnet2_ops&subdirectory=pointnet2_ops_lib"

Core packages:

Pytorch version 2.0.1
open-clip-torch version 2.20.0
timm version 0.9.7
DeepSpeed version 0.10.3
Open3D version 0.17.0

Model Zoo

Model	Training Data	Objaverse-LVIS Top1 (Top5)	ModelNet40 Top1 (Top5)	ScanObjectNN Top1 (Top5)
Uni3d-B	Ensembled w/o LVIS	45.9 (74.8)	86.1 (98.7)	61.7 (89.5)
Uni3d-B	Ensembled	51.7 (80.8)	86.3 (97.9)	63.8 (90.2)
Uni3d-L	Ensembled w/o LVIS	46.2 (74.7)	86.6 (97.8)	58.4 (90.1)
Uni3d-L	Ensembled	53.1 (81.5)	86.3 (98.3)	58.2 (89.4)
Uni3d-g	Ensembled w/o LVIS	47.2 (76.1)	86.8 (98.4)	66.5 (90.1)
Uni3d-g	Ensembled	53.5 (82.0)	87.3 (99.2)	63.9 (91.7)
Uni3d-g 🔥	Ensembled	55.3 (82.9)	88.2 (99.3)	65.3 (92.7)

Evaluation of Zero-shot 3D classification

We evaluate the zero-shot 3D classification performance on three datasets: Objaverse-LVIS, ModelNet40 and ScanObjectNN.

Please refer to DATASETS.md for evaluation dataset preparation.
[Recommended 🤗] Download the clip model and put it in /path/to/clip_model folder.
Download model zoo weights and put them in /path/to/checkpoints folder.
Run bash scripts/inference.sh [scale] to evaluate the model on the above datasets, e.g., bash scripts/inference.sh giant.

Pre-training

Please refer to DATASETS.md for pre-train dataset preparation.
[Recommended 🤗] Download the clip model and put it in /path/to/clip_model folder.
[Recommended 🤗] Download the initialization model and put it in /path/to/init_model folder.
Run bash scripts/pretrain.sh to pre-train the model on ensemble datasets.

Visualization

Open-world Understanding

One-shot Part Segmentation

Point Cloud Painting

Cross-modal Retrieval

Acknowledgement

Uni3D is built using the awesome EVA, OpenCLIP, timm, DeepSpeed, ULIP and OpenShape.

Citation

@inproceedings{zhou2023uni3d,
  title={Uni3d: Exploring unified 3d representation at scale},
  author={Zhou, Junsheng and Wang, Jinsheng and Ma, Baorui and Liu, Yu-Shen and Huang, Tiejun and Wang, Xinlong},
  booktitle={International Conference on Learning Representations (ICLR)},
  year={2024}
}