Home

Awesome

[ICLR 2024 spotlight] InstructScene

<h4 align="center">

InstructScene: Instruction-Driven 3D Indoor Scene Synthesis with Semantic Graph Prior

Chenguo Lin, Yadong Mu

arXiv Project page License: MIT

<p> <img width="240" alt="bedroom" src="./assets/bedroom_1.gif"> <img width="240" alt="diningroom" src="./assets/diningroom_1.gif"> <img width="240" alt="livingroom" src="./assets/livingroom_1.gif"> </p> <p> <img width="730" alt="pipeline", src="./assets/pipeline.png"> </p> </h4>

This repository contains the official implementation of the paper: InstructScene: Instruction-Driven 3D Indoor Scene Synthesis with Semantic Graph Prior, which is accepted by ICLR 2024 for spotlight presentation. InstructScene is a generative framework to synthesize 3D indoor scenes from instructions. It is composed of a semantic graph prior and a layout decoder.

Feel free to contact me (chenguolin@stu.pku.edu.cn) or open an issue if you have any questions or suggestions.

📢 News

📋 TODO

🔧 Installation

You may need to modify the specific version of torch in settings/setup.sh according to your CUDA version. There are not restrictions on the torch version, feel free to use your preferred one.

git clone https://github.com/chenguolin/InstructScene.git
cd InstructScene
bash settings/setup.sh

Download the Blender software for visualization.

cd blender
wget https://download.blender.org/release/Blender3.3/blender-3.3.1-linux-x64.tar.xz
tar -xvf blender-3.3.1-linux-x64.tar.xz
rm blender-3.3.1-linux-x64.tar.xz

📊 Dataset

Dataset used in InstructScene is based on 3D-FORNT and 3D-FUTURE. Please refer to the instructions provided in their official website to download the original dataset. One can refer to the dataset preprocessing scripts in ATISS and DiffuScene, which are similar to ours.

We provide the preprocessed instruction-scene paired dataset used in the paper and rendered images for evaluation on HuggingFace.

import os
from huggingface_hub import hf_hub_url
url = hf_hub_url(repo_id="chenguolin/InstructScene_dataset", filename="InstructScene.zip", repo_type="dataset")
os.system(f"wget {url} && unzip InstructScene.zip")
url = hf_hub_url(repo_id="chenguolin/InstructScene_dataset", filename="3D-FRONT.zip", repo_type="dataset")
os.system(f"wget {url} && unzip 3D-FRONT.zip")

Please refer to dataset/README.md for more details.

👀 Visualization

We provide a helpful script to visualize synthesized scenes by Blender. Please refer to blender/README.md for more details.

We also provide many useful visualization functions in src/utils/visualize.py, including creating appropriate floor plans, drawing scene graphs, adding instructions as titles in the rendered images, making gifs, etc.

🚀 Usage

Note that:

0️. 📦 fVQ-VAE: quantize OpenShape/CLIP features of objects

Training

We provide the pretrained weights of fVQ-VAE on HuggingFace. Our preprocessed dataset contains the original OpenShape features and correspondingly quantization indices.

import os
from huggingface_hub import hf_hub_url
os.system("mkdir -p out/threedfront_objfeat_vqvae/checkpoints")
url = hf_hub_url(repo_id="chenguolin/InstructScene_dataset", filename="threedfront_objfeat_vqvae_epoch_01999.pth", repo_type="dataset")
os.system(f"wget {url} -O out/threedfront_objfeat_vqvae/checkpoints/epoch_01999.pth")
url = hf_hub_url(repo_id="chenguolin/InstructScene_dataset", filename="objfeat_bounds.pkl", repo_type="dataset")
os.system(f"wget {url} -O out/threedfront_objfeat_vqvae/objfeat_bounds.pkl")

You can also train the fVQ-VAE from scratch. However, you should update the quantization indices in the dataset (stored in dataset/InstructScene/threed_front_<room_type>/<scene_id>/models_info.pkl) accordingly.

# bash scripts/train_objfeatvqvae.sh <tag> <gpu_id>
bash scripts/train_objfeatvqvae.sh threedfront_objfeat_vqvae 0

Inference (only for debugging)

# bash scripts/inference_objfeatvqvae.sh <tag> <gpu_id> <epoch>
bash scripts/inference_objfeatvqvae.sh threedfront_objfeat_vqvae 0 -1
# '-1' means the latest checkpoint

1️. 🦾 Layout Decoder: embody 3D scenes from semantic graphs

Training

# bash scripts/train_sg2sc_objfeat.sh <room_type> <tag> <gpu_id> <fvqvae_tag>
bash scripts/train_sg2sc_objfeat.sh bedroom bedroom_sg2scdiffusion_objfeat 0 threedfront_objfeat_vqvae

Inference (only for debugging)

# bash scripts/inference_sg2sc_objfeat.sh <room_type> <tag> <gpu_id> <epoch> <fvqvae_tag> <(optional) cfg_scale>
bash scripts/inference_sg2sc_objfeat.sh bedroom bedroom_sg2scdiffusion_objfeat 0 -1 threedfront_objfeat_vqvae 1.0

To visualize synthesized scenes, replace --n_scene 0 in scripts/inference_sg2sc_objfeat.sh to --n_scenes 5 --visualize --resolution 1024, which means to visualize 5 synthesized scenes and save the rendered images with a resolution of 1024x1024. Otherwise, it will only compute the iRecall score for evaluation.

2️. 🤖 Semantic Graph Prior: design semantic graphs from instructions

Training

# bash scripts/train_sg_vq_objfeat.sh <room_type> <tag> <gpu_id>
bash scripts/train_sg_vq_objfeat.sh bedroom bedroom_sgdiffusion_vq_objfeat 0

Inference

# bash scripts/inference_sg_vq_objfeat.sh <room_type> <tag> <gpu_id> <epoch> <fvqvae_tag> <sg2sc_tag> <(optional) cfg_scale> <(optional) sg2sc_cfg_scale>
bash scripts/inference_sg_vq_objfeat.sh bedroom bedroom_sgdiffusion_vq_objfeat 0 -1 threedfront_objfeat_vqvae bedroom_sg2scdiffusion_objfeat 1.0 1.0

To visualize synthesized scenes, replace --n_scene 0 in scripts/inference_sg_vq_objfeat.sh to --n_scenes 5 --visualize --resolution 1024, which means to visualize 5 synthesized scenes and save the rendered images with a resolution of 1024x1024. Otherwise, it will only compute the iRecall score for evaluation.

Evaluation

Evaluation should be conducted after the inference script is executed with the --visualize flag, which will save the rendered images in the output directory.

FID, CLIP-FID and KID
python3 src/compute_fid_scores.py configs/bedroom_sgdiffusion_vq_objfeat.yaml --tag bedroom_sgdiffusion_vq_objfeat --checkpoint_epoch -1
SCA (scene classification accuracy)
python3 src/synthetic_vs_real_classifier.py configs/bedroom_sgdiffusion_vq_objfeat.yaml --tag bedroom_sgdiffusion_vq_objfeat --checkpoint_epoch -1

Applications

Replace the python file name in scripts/inference_sg_vq_objfeat.sh from generate_sg.py to stylize_sg.py, rearrange_sg.py or complete_sg.py for "stylization", "rearrangement" or "completion" downstream tasks, respectively.

Please refer to these python files for more detailed arguments and usage.

😊 Acknowledgement

We would like to thank the authors of ATISS, DiffuScene, OpenShape, NAP and CLIPLayout for their great work and generously providing source codes, which inspired our work and helped us a lot in the implementation.

📚 Citation

If you find our work helpful, please consider citing:

@inproceedings{lin2024instructscene,
  title={InstructScene: Instruction-Driven 3D Indoor Scene Synthesis with Semantic Graph Prior},
  author={Chenguo Lin and Yadong Mu},
  booktitle={International Conference on Learning Representations (ICLR)},
  year={2024}
}