Home

Awesome

<p align="center"> <h1 align="center"><strong>MiniGPT-3D: Efficiently Aligning 3D Point Clouds with Large Language Models using 2D Priors</strong></h1> <p align="center"> Yuan Tang&emsp; Xu Han&emsp; Xianzhi Li*&emsp; Qiao Yu&emsp; Yixue Hao&emsp; Long Hu&emsp; Min Chen <br> Huazhong University of Science and Technology&emsp;South China University of Technology </p> </p> <p align="center"> <a><strong>ACM MM 2024 </strong></a> <a href='https://tangyuan96.github.io/minigpt_3d_project_page/'><img src='https://img.shields.io/badge/Project-Page-Green'></a> <a href='https://arxiv.org/pdf/2405.01413'><img src='https://img.shields.io/badge/Paper-Arxiv-red'></a> <a href='https://huggingface.co/YuanTang96/MiniGPT-3D'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Model-blue'></a> </p>

PWC PWC PWC

🏠 About

pic_1 pic_1

🔥 News

<!-- contents with emoji -->

📋 Contents

🔍 Overview

Model

pic_2

Note: MiniGPT-3D takes <span style="color:skyblue;">the first step in efficient 3D-LLM</span>, we hope that MiniGPT-3D can bring new insights to this community.

Experiment Results

Quantitative Comparisons with baselines <span style="color:red;">[Using close-source LLM GPT-3.5 and GPT-4 to evaluate]</span>.

pic_3

Quantitative Comparisons with baselines <span style="color:green;">[Using open-source LLM Qwen2-72B-Instruct to evaluate]</span>

The results refer from GreenPLM.

pic_3_2

Qualitative Comparisons with baselines.

pic_3

💬 Dialogue Examples

Please refer to our paper for more dialogue examples.

pic_4

📦 Training and Evaluation

Installation

We test our codes under the following environment:

To start:

  1. Clone this repository.

    git clone https://github.com/TangYuan96/MiniGPT-3D.git
    cd MiniGPT-3D
    
  2. Install packages

    By default, you have installed conda.

    conda env create -f environment.yml
    conda activate minigpt_3d
    bash env_install.sh
    

Data Preparation

  1. Download all data files. They require about 78GB of storage space.
  2. Organize 660K Objaverse colored point clouds. Run the following command to merge the two files into one and uncompress it. This will produce a folder named 8192_npy containing 660K point cloud files named {Objaverse_ID}_8192.npy. Each file is a numpy array with dimensions (8192, 6), where the first three dimensions are xyz and the last three dimensions are rgb in [0, 1] range.
    cat Objaverse_660K_8192_npy_split_a* > Objaverse_660K_8192_npy.tar.gz
    tar -xvf Objaverse_660K_8192_npy.tar.gz
    
    Then, move all files in 8192_npy folder to ./data/objaverse_data folder.
  3. Organize annotation files. Move all json files and txt files to ./data/anno_data folder.
  4. Organize the test data file of ModelNet40. Move modelnet40_test_8192pts_fps.dat to ./data/modelnet40_data folder.

Finally, the overall data directory structure should be:

MiniGPT-3D/data
|-- anno_data
|   |-- PointLLM_brief_description_660K.json
|   |-- PointLLM_brief_description_660K_filtered.json
|   |-- PointLLM_brief_description_val_200_GT.json
|   |-- PointLLM_complex_instruction_70K.json
|   |-- object_ids_660K.txt
|   `-- val_object_ids_3000.txt
|-- modelnet40_data 
|   |-- modelnet40_test_8192pts_fps.dat
|-- objaverse_data
|   |-- 00000054c36d44a2a483bdbff31d8edf_8192.npy
|   |-- 00001ec0d78549e1b8c2083a06105c29_8192.npy
|   .......

Weight Preparation

We sort out the model weights required by MiniGPT-3D during training and inference.

  1. Download model weights.
  2. Move the the params_weight folder to MiniGPT-3D project folder.

Finally, the overall data directory structure should be:

MiniGPT-3D
|-- params_weight
|   |-- MiniGPT_3D_stage_3       # Our MiniGPT-3D stage III weight, needed to verify the results of paper
|   |-- MiniGPT_3D_stage_4       # Our MiniGPT-3D stage IV weight, Needed to verify the results of paper
|   |-- Phi_2                    # LLM weight 
|   |-- TinyGPT_V_stage_3        # 2D-LLM weights including  loRA & Norm of LLM and  projector 
|   |-- all-mpnet-base-v2        # Used in the caption traditional evaluation
|   |-- bert-base-uncased        # Used in initialize Q-former
|   |-- pc_encoder               # point cloud encoder
|   `-- sup-simcse-roberta-large # Used in the caption traditional evaluation
|-- train_configs
|   `-- MiniGPT_3D
|   .......

Gradio Conversation Demo

  1. You can run the following command to start a local gradio conversation demo:

    python UI_demo.py --cfg-path ./eval_configs/MiniGPT_3D_conv_UI_demo.yaml --gpu-id 0
    
  2. Then, copy the link http://127.0.0.1:7860/ to your browser, you can input the supported Objaverse object id (660K objects) or upload one object file (.ply or .npy) to talk with our MiniGPT-3D.

Example: Input the object ID: conv_demo_1

Example: Upload the object file: conv_demo_2

Train

Edit the output path of each Stages

If you want to use the default output path of each Stages, you can ignore the following steps.

  1. Set your output path of Stage I to here at Line 44 and here at Line 8.
  2. Set your output path of Stage II to here at Line 51 and here at Line 7.
  3. Set your output path of Stage III to here at Line 66 and here at Line 7.
  4. Set your output path of Stage IV to here at Line 66.

Train Stage I

CUDA_VISIBLE_DEVICES=0 python  train.py --cfg-path ./train_configs/MiniGPT_3D/stage_1.yaml

Train Stage II

CUDA_VISIBLE_DEVICES=0 python  train.py --cfg-path ./train_configs/MiniGPT_3D/stage_2.yaml

Train Stage III

CUDA_VISIBLE_DEVICES=0 python  train.py --cfg-path ./train_configs/MiniGPT_3D/stage_3.yaml

Train Stage IV

CUDA_VISIBLE_DEVICES=0 python  train.py --cfg-path ./train_configs/MiniGPT_3D/stage_4.yaml

Evaluation

A. Set the output path of Stage III & IV in evaluation configuration

If you just want to verify the results of our paper, you can ignore the following steps:

  1. Set your the output path of Stage III to here at Line 8.

  2. Set your the output path of Stage IV to here at Line 9.

B. Output the result jsons

  1. Output the result of open vocabulary classification on objaverse

    # Prompt 0: 
    export PYTHONPATH=$PWD
    CUDA_VISIBLE_DEVICES=0 python pointllm/eval/eval_objaverse.py --out_path ./output/test --task_type classification  --cfg-path ./eval_configs/benchmark_evaluation_paper.yaml    --prompt_index 0 
    
    # Prompt 1: 
    export PYTHONPATH=$PWD
    CUDA_VISIBLE_DEVICES=0 python pointllm/eval/eval_objaverse.py --out_path ./output/test --task_type classification  --cfg-path ./eval_configs/benchmark_evaluation_paper.yaml    --prompt_index 1
    
  2. Output the result of close-set zero-shot classification on ModelNet40

    # Prompt 0:
    export PYTHONPATH=$PWD
    CUDA_VISIBLE_DEVICES=0 python pointllm/eval/eval_modelnet_cls.py --out_path ./output/test  --cfg-path ./eval_configs/benchmark_evaluation_paper.yaml    --prompt_index 0
    
    # Prompt 1: 
    export PYTHONPATH=$PWD
    CUDA_VISIBLE_DEVICES=0 python pointllm/eval/eval_modelnet_cls.py --out_path ./output/test  --cfg-path ./eval_configs/benchmark_evaluation_paper.yaml    --prompt_index 1
    
  3. Output the result of object captioning on objaverse

    export PYTHONPATH=$PWD
    CUDA_VISIBLE_DEVICES=0 python pointllm/eval/eval_objaverse.py --out_path ./output/test  --task_type captioning  --cfg-path ./eval_configs/benchmark_evaluation_paper.yaml    --prompt_index 2
    

C. Evaluate Json Results

a. Evaluate with close-source LLM from OpenAI <span style="color:red;">[Not recommended]</span>

In GreenPLM, we have noticed that the close-source LLMs GPT-3.5 and GPT-4 have two major drawbacks: inconsistent API versions and high evaluation costs (~35 CNY or 5 USD per one evaluation). For instance, the GPT-3.5-turbo-0613 model used in PointLLM and our MiniGPT-3D is no longer maintained, making it difficult to replicate the results.

<details> <summary>The following steps are for evaluation using OpenAI API. Maybe it does not work! (click to expand)</summary>
  1. Evaluate the open vocabulary classification on objaverse
   export PYTHONPATH=$PWD
   export OPENAI_API_KEY=sk-****
   python pointllm/eval/evaluator.py --results_path /path/to/evaluation/PointLLM_brief_description_val_200_GT_Objaverse_classification_prompt0.json  --model_type gpt-4-0613 --eval_type open-free-form-classification --parallel --num_workers 15
   export PYTHONPATH=$PWD
   export OPENAI_API_KEY=sk-****
   python pointllm/eval/evaluator.py --results_path /path/to/evaluation/PointLLM_brief_description_val_200_GT_Objaverse_classification_prompt1.json  --model_type gpt-4-0613 --eval_type open-free-form-classification --parallel --num_workers 15
  1. Evaluate the close-set zero-shot classification on ModelNet40
   export PYTHONPATH=$PWD
   export OPENAI_API_KEY=sk-****
   python pointllm/eval/evaluator.py --results_path /path/to/evaluation/ModelNet_classification_prompt0.json  --model_type gpt-3.5-turbo-0613 --eval_type modelnet-close-set-classification --parallel --num_workers 15
   export PYTHONPATH=$PWD
   export OPENAI_API_KEY=sk-****
   python pointllm/eval/evaluator.py --results_path /path/to/evaluation/ModelNet_classification_prompt1.json  --model_type gpt-3.5-turbo-0613 --eval_type modelnet-close-set-classification --parallel --num_workers 15
  1. Evaluate the object captioning on objaverse
export PYTHONPATH=$PWD
export OPENAI_API_KEY=sk-****
python pointllm/eval/evaluator.py --results_path /path/to/evaluation/PointLLM_brief_description_val_200_GT_Objaverse_captioning_prompt2.json --model_type gpt-4-0613 --eval_type object-captioning --parallel --num_workers 15
</details>
b. Evaluate with open-source Qwen2-72B-Instruct <span style="color:green;">[Recommend]</span>

In GreenPLM, we propose new 3D object classification and caption benchmarks using GPT-4 level open-source Qwen2-72B-Instruct to make evaluations cost-effective and results consistently reproducible.

  1. Evaluate the open vocabulary classification on objaverse
export PYTHONPATH=$PWD
export DASHSCOPE_API_KEY=sk-xxx
python ./pointllm/eval/evaluator_opensource_llm_QwenAPI.py  \
        --results_path /path/to/evaluation/PointLLM_brief_description_val_200_GT_Objaverse_classification_prompt0.json  \
        --eval_type open-free-form-classification  \
        --model_type qwen2-72b-instruct \
        --parallel --num_workers 4
export PYTHONPATH=$PWD
export DASHSCOPE_API_KEY=sk-xxx
python ./pointllm/eval/evaluator_opensource_llm_QwenAPI.py  \
        --results_path /path/to/evaluation/PointLLM_brief_description_val_200_GT_Objaverse_classification_prompt1.json  \
        --eval_type open-free-form-classification  \
        --model_type qwen2-72b-instruct \
        --parallel --num_workers 4
  1. Evaluate the close-set zero-shot classification on ModelNet40
export PYTHONPATH=$PWD
export DASHSCOPE_API_KEY=sk-xxx
python ./pointllm/eval/evaluator_opensource_llm_QwenAPI.py  \
    --results_path /path/to/evaluation/ModelNet_classification_prompt0.json  \
    --eval_type modelnet-close-set-classification  \
    --model_type qwen2-72b-instruct \
    --parallel --num_workers 4
export PYTHONPATH=$PWD
export DASHSCOPE_API_KEY=sk-xxx
python ./pointllm/eval/evaluator_opensource_llm_QwenAPI.py  \
    --results_path /path/to/evaluation/ModelNet_classification_prompt1.json  \
    --eval_type modelnet-close-set-classification  \
    --model_type qwen2-72b-instruct \
    --parallel --num_workers 4
  1. Evaluate the object captioning on objaverse
export PYTHONPATH=$PWD
export DASHSCOPE_API_KEY=sk-xxx
python ./pointllm/eval/evaluator_opensource_llm_QwenAPI.py  \
        --results_path /path/to/evaluation/PointLLM_brief_description_val_200_GT_Objaverse_captioning_prompt2.json  \
        --eval_type object-captioning  \
        --model_type qwen2-72b-instruct \
        --parallel --num_workers 4
c. Traditional Metric Evaluation

For the object captioning task, run the following command to evaluate model outputs with traditional metrics Sentence-BERT and SimCSE.

CUDA_VISIBLE_DEVICES=0 python pointllm/eval/traditional_evaluator.py --results_path /path/to/evaluation/PointLLM_brief_description_val_200_GT_Objaverse_captioning_prompt2.json

Run local gradio demo using your weights

  1. Set your the output path of Stage III here at Line 8.

  2. Set your the output path of Stage IV here at Line 9.

  3. You can run the following command to start a local gradio conversation demo:

    python UI_demo.py --cfg-path ./eval_configs/MiniGPT_3D_conv_UI_demo.yaml --gpu-id 0
    

📝 TODO List

🔗 Citation

If you find our work helpful, please consider citing:

@article{tang2024minigpt,
  title={MiniGPT-3D: Efficiently Aligning 3D Point Clouds with Large Language Models using 2D Priors},
  author={Tang, Yuan and Han, Xu and Li, Xianzhi and Yu, Qiao and Hao, Yixue and Hu, Long and Chen, Min},
  journal={arXiv preprint arXiv:2405.01413},
  year={2024}
}

📄 License

<a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-nc-sa/4.0/80x15.png" /></a> <br /> This work is under the <a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/">Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License</a>.

📚 Related Work

Together, Let's make LLM for 3D great!

👏 Acknowledgements

We would like to thank the authors of PointLLM, TinyGPT-V, MiniGPT-4, and Octavius for their great works and repos.