Awesome
LASST(Language-guided Semantic Style Transfer of 3D Indoor Scenes)
Accepted by ACM Multimedia PIES-ME 2022. Paper Created by Bu Jin, Beiwen Tian, Hao Zhao and Guyue Zhou from Institute for AI Industry Research(AIR), Tsinghua University.
Introduction
3D content creation and editing is a long-existing multimedia demand. With the surge of metaverse, tech giants and consumers are now looking forward to a high-quality virtual world that people can live in and interactive with. We study the problem of 3D indoor scene style transfer, which would promote the user experience of metaverse residents.
In this repository, we address the new problem of language-guided semantic style transfer of 3D indoor scenes. The input is a 3D indoor scene mesh and several phrases that describe the target scene. Firstly, 3D vertex coordinates are mapped to RGB residues by a multi-layer perceptron. Secondly, colored 3D meshes are differentiablly rendered into 2D images, via a viewpoint sampling strategy tailored for indoor scenes. Thirdly, rendered 2D images are compared to phrases, via pre-trained vision-language models. Lastly, errors are back-propagated to the multi-layer perceptron to update vertex colors corresponding to certain semantic categories. The whole process of LASST can be seen from below. Code and models will be made publicly available.
Getting Started
Installation
conda env create --name LASST python=3.7
conda install --yes --file requirements.txt
System Requirements
- Python 3.7
- CUDA 11.0
- GPU w/ minimum 8 GB ram
Data Preparation
The dataset we used is ScanNetV2 dataset. See HERE for more details. Remember to fix the data path in src/local.py
as your own datapath.
Run examples
Run the following command for a room with wooden floor,steel refridgerator
:
sh ./scripts/go.sh
The rendered images and final outputs will be saved to results/
.
Outputs
semantic mask(input mesh, w/o semantic mask, w/ semantic mask)
text prompt: steel table
text prompt: marble floor
text prompt: wooden floor, silk sofa, wooden table
sampling(input mesh, text2mesh sampling, LASST sampling)
text prompt: marble_floor, fabric sofa
text prompt: wooden floor, steel refrigerator
text prompt: golden chair, oak table
regularization(input mesh, None, rgb, hsv)
text prompt: leather sofa
text prompt: leather sofa, marble floor, oak table
gt label vs. pred label
<img src="examples/example/gt_pred_label/gt_pred.jpg" width="100%">Citation
@article{jin2022language,
title={Language-guided Semantic Style Transfer of 3D Indoor Scenes},
author={Jin, Bu and Tian, Beiwen and Zhao, Hao and Zhou, Guyue},
journal={arXiv preprint arXiv:2208.07870},
year={2022}
}