Home

Awesome

Chat-3D v2

This is an official repo for paper "Chat-3D v2: Bridging 3D Scene and Large Language Models with Object Identifiers". [paper]

News

[2024.04] 🔥 A refined implementation of Chat-3D v2 is released. The old version v2.0 has been archived in branch v2.0. This main branch is now for the new version (v2.1).

[2024.01] Update training guide for grounding on ScanRefer.

[2023.12] Code release. The main training architecture is based on our former work Chat-3D.

🔥 v2.1 vs v2.0

🔨 Preparation

🤖 Training and Inference

📄 Citation

If you find this project useful in your research, please consider cite:

@article{huang2023chat,
  title={Chat-3D v2: Bridging 3D Scene and Large Language Models with Object Identifiers},
  author={Huang, Haifeng and Wang, Zehan and Huang, Rongjie and Liu, Luping and Cheng, Xize and Zhao, Yang and Jin, Tao and Zhao, Zhou},
  journal={arXiv preprint arXiv:2312.08168},
  year={2023}
}
@article{wang2023chat,
  title={Chat-3d: Data-efficiently tuning large language model for universal dialogue of 3d scenes},
  author={Wang, Zehan and Huang, Haifeng and Zhao, Yang and Zhang, Ziang and Zhao, Zhou},
  journal={arXiv preprint arXiv:2308.08769},
  year={2023}
}

Stay tuned for our project. 🔥

If you have any questions or suggestions, feel free to drop us an email (huanghaifeng@zju.edu.cn, wangzehan01@zju.edu.cn) or open an issue.

😊 Acknowledgement

Thanks to the open source of the following projects:

LLMs: LLaMA, Vicuna

3D Datasets: ScanNet, ScanRefer, ReferIt3D, Scan2Cap, ScanQA, SQA3D, Multi3dRefer

3D Segmentors: PointGroup, Mask3D

3D Encoders: ULIP, Uni3D

Multi-modal LLMs: VideoChat, LEO

3D Expert Models: vil3dref