Home

Awesome

3D Semantic Scene Graph Estimations

This is a framework for developing 3D semantic scene graph estimations. The repository includes five different methods, namely IMP, VGfM, 3DSSG, SGFN and MonoSSG.

<details> <summary>This repository has been used for the following publications:</summary> </details>

Setup

Environment.

# if you don't have miniconda
source setup_conda.sh 

# setup
source setup.sh

mkdir data
ln -s /path/to/your/3RScan ./data/

source Init.sh # This will set PYTHONPATH and activate the environment for you.

Preparation

Download data

cd files
bash preparation.sh

Prepare 3RScan dataset

Please make sure you agree the 3RScan Terms of Use first, and get the download script and put it right at the 3RScan main directory.

Then run

python scripts/RUN_prepare_dataset_3RScan.py --download --thread 8

Generate Experiment data

# For GT
# This script downloads preprocessed data for GT data generation, and generate GT data.
python scripts/RUN_prepare_GT_setup_3RScan.py --thread 16

# For Dense
# This script downloads the inseg.ply files and unzip them to your 3rscan folder, and 
generates training data.
python scripts/RUN_prepare_Dense_setup_3RScan.py -c configs/dataset/config_base_3RScan_inseg_l20.yaml --thread 16

# For Sparse
# This script downloads the 2dssg_orbslam3.[json,ply] files and unzip them to your 3rscan folder, and 
generates training data.
python scripts/RUN_prepare_Sparse_setup_3RScan.py -c configs/dataset/config_base_3RScan_orbslam_l20.yaml --thread 16

Train

The first time you may need want to chagne the wandb account in configs/config_default.yaml. Change the wanb.entity and wanb.project to yours. Or you can disable logging by passing --dry_run.

source Init.sh

# Train and eval everything. 
python scripts/RUN_traineval_all.py

# Train single
python main.py --mode train --config /path/to/your/config/file

# Eval one
python main.py --mode eval --config /path/to/your/config/file

Trained models

We provide trained model using the optimized code (this one), instead of the one reported in our CVPR23 paper. Although the numbers are different but all methods follow the same trend. We encourage people compare to the results obtained by yourself using this repo.

Download the trained models and unzip them under experiments folder (you may need to create one by yourself).

Note that the implementation of SceneGraphFusion and 3DSSG are different from the original papers. We tried to make all methods shared the same model settings in order to compare them fairly.

<details> <summary>Results</summary>

The first Trip. Obj. Pred. are the result including all the predictions. The second Trip., Obj., Pred.* without considering None relationship.

With the same setup as the Table 1. 3RSca dataset with 20 objects and 8 predicate classes.

NameInputTrip.Obj.Pred.Trip.*Obj.*Pred.*mR.Obj.mR. Pred.
IMPGT45.365.494.044.366.056.656.241.8
VGfMGT52.970.895.051.571.462.859.546.8
3DSSGGT31.855.195.439.755.671.047.761.5
SGFNGT42.763.694.347.664.469.053.663.1
OursGT63.979.495.663.480.076.078.264.8
IMPDENSE24.647.789.219.749.520.934.723.9
VGfMDENSE25.948.490.419.650.020.434.821.5
3DSSGDENSE14.537.088.012.937.422.026.223.7
SGFNDENSE27.749.789.922.051.627.537.732.6
OursDENSE29.552.088.623.353.828.443.835.8
IMPSPARSE8.627.790.93.624.54.020.214.7
VGfMSPARSE9.028.090.74.028.84.424.313.9
3DSSGSPARSE1.311.190.21.011.74.66.113.9
SGFNSPARSE2.515.488.33.415.97.08.914.5
OursSPARSE9.928.789.86.829.58.227.017.6

With the same setup as the Table 2. 3RSca dataset with 160 objects and 26 predicate classes.

NameInputTrip.Obj.Pred.Trip.*Obj.*Pred.*mRe.Obj.mRe.Pred.
IMPGT64.243.016.24.942.916.416.03.6
VGfMGT64.546.017.45.946.017.619.15.5
3DSSGGT64.828.067.16.927.967.112.120.9
SGFNGT64.736.948.46.636.848.416.214.4
OursGT67.653.448.114.853.248.128.924.7
</details>
Input Type \ MehtodIMPVGfM3DSSGSGFNOurs
GTLinkLinkLinkLinkLink
DENSELinkLinkLinkLinkLink
SPARSELinkLinkLinkLinkLink
GT [160/26]LinkLinkLinkLinkLink

License

<a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-nc-sa/4.0/88x31.png" /></a><br />This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/">Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License</a>.

Citations

If you find the code useful please consider citing our papers:

@inproceedings{wu2023incremental,
  title={Incremental 3D Semantic Scene Graph Prediction from RGB Sequences},
  author={Wu, Shun-Cheng and Tateno, Keisuke and Navab, Nassir and Tombari, Federico},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={5064--5074},
  year={2023},
}

@inproceedings{wu2021scenegraphfusion,
  title={Scenegraphfusion: Incremental 3d scene graph prediction from rgb-d sequences},
  author={Wu, Shun-Cheng and Wald, Johanna and Tateno, Keisuke and Navab, Nassir and Tombari, Federico},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={7515--7525},
  year={2021}
}

@inproceedings{Wald2020,
    title = {{Learning 3D Semantic Scene Graphs from 3D Indoor Reconstructions}},
    author = {Wald, Johanna and Dhamo, Helisa and Navab, Nassir and Tombari, Federico},
    booktitle = {Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
    year = {2020}
}