Home

Awesome

ShapeTalk: A Language Dataset and Framework for 3D Shape Edits and Deformations

representative

Introduction

This codebase accompamnies our <a href="https://openaccess.thecvf.com/content/CVPR2023/papers/Achlioptas_ShapeTalk_A_Language_Dataset_and_Framework_for_3D_Shape_Edits_CVPR_2023_paper.pdf">CVPR-2023<a> paper.

Related Works

Citation

If you find this work useful in your research, please consider citing:

@inproceedings{achlioptas2023shapetalk,
    title={{ShapeTalk}: A Language Dataset and Framework for 3D Shape Edits and Deformations},
    author={Achlioptas, Panos and Huang, Ian and Sung, Minhyuk and
            Tulyakov, Sergey and Guibas, Leonidas},
    booktitle=Conference on Computer Vision and Pattern Recognition (CVPR)
    year={2023}
}

Installation

Optional, create first a clean environment. E.g.,

 conda create -n changeit3d python=3.8
 conda activate changeit3d 
 conda install pytorch==1.11.0 torchvision==0.12.0 cudatoolkit=11.3 -c pytorch

Then,

 git clone https://github.com/optas/changeit3d
 cd changeit3d
 pip install -e .

Last, if you want to train pointcloud autoencoders or run some of our introduced evaluation metrics, consider installing a fast (GPU-based) implementation of Chamfer's Loss:

git submodule add https://github.com/ThibaultGROUEIX/ChamferDistancePytorch changeit3d/losses/ChamferDistancePytorch

Basic structure of this repository

./changeit3d	
├── evaluation 				# routines for evaluating shape edits via language
├── models 				# neural-network definitions
├── in_out 				# routines related to I/O operations
├── language 				# tools used to process text (tokenization, spell-check, etc.)
├── external_tools 			# utilities to integrate code from other repos (ImNet, SGF, 3D-shape-part-prediction)
├── scripts 				# various Python scripts
│   ├── train_test_pc_ae.py   	        # Python script to train/test a 3D point-cloud shape autoencoder
│   ├── train_test_latent_listener.py   # Python script to train/test a neural listener based on some previously extracted latent shape-representation
│   ├── ...
│   ├── bash_scripts                    # wrappers of the above (python-based) scripts to run in batch mode with a bash terminal
├── notebooks 				# jupyter notebook versions of the above scripts for easier deployment (and more)	

ShapeTalk Dataset ( :rocket: )

Our work introduces a large-scale visio-linguistic dataset -- ShapeTalk.

First, consider downloading ShapeTalk and then quickly read its manual to understand its structure.

Exploring ShapeTalk ( :microscope: )

Assuming you downloaded ShapeTalk, you should see at the top the downloaded directory subfolders:

SubfolderContent-explanation
images2D renderings used for contrasting 3D shapes and collecting referential language via Amazon Mech. Turk
pointcloudspointclouds extracted from the surface of the underlying 3D shapes -- used e.g. for training a PCAE & evaluating edits
languagefiles capturing the collected language: see ShapeTalk' manual if you haven't done it yet

:arrow_right: To familiarize yourself with ShapeTalk you can run this notebook to compute basic statistics about it.

:arrow_right: To make a more fine-grained analysis of ShapeTalk w.r.t. its language please run this notebook.

Neural Listeners ( :ear: )

You can train and evaluate our neural listening architectures with different configurations using this python script or its equivalent notebook.

<!-- You can also [download](./changeit3d/scripts/bash_scripts/download_pretrained_nets.sh) and use our pre-trained neural listeners trained with (also pretrained) latent-space-based shape representations. -->

Our attained accuracies are given below:

Shape BackboneModalityOverallEasyHardFirstLastMulti-utter<br/> Trans. vs. (LSTM)
ImNet-AEimplicit68.0%72.6%63.4%72.4%64.9%73.2% (78.4%)
SGF-AEpointcloud70.7%75.3%66.1%74.9%68.0%76.5% (79.9%)
PC-AEpointcloud71.3%75.4%67.2%75.2%70.4%75.3% (81.5%)
ResNet-101image72.9%75.7%70.1%76.9%68.7%79.8% (84.3%)
ViT (L/14/CLIP)image73.4%76.6%70.2%77.0%70.7%79.6% (84.5%)
ViT (H/14/OpenCLIP)image75.5%78.5%72.4%79.5%72.2%82.3% (85.8%)

For the meaning of the sub-populations (easy, hard, etc.), please see our paper, Table 5.

All reported numbers above concern the transformer-based baseline presented in our paper; the exception is the numbers inside parenthesis ("Multi" (LSTM)) which are based on our LSTM baseline. The LSTM baseline performs better only in this "Multi" scenario, possibly because our transformer struggles to self-attend well to all concatenated input utterances.

If you have new results, please reach out to Panos Achlioptas to include in our competition page.

ChangeIt3DNet ( neural 3D editing via language :hammer: )

The algorithmic approach we propose and follow in this work to train a language/3D-shape editor such as the ChangeIt3DNet is to break down the process into three steps:

Specific details on how to execute the above steps.

Pretained Weights and Networks

You can download a large pool of pre-trained models using this bash script.

The structure of the downloaded folders is assumed to be as downloaded by the rest of this codebase to work seamlessly. However, please update (whenever prompted) the config.json.txt files to point to the downloaded directories of the pretrained networks in your local hard drive!

:exclamation: The provided weights are not identical to those used in the CVPR paper. Unfortunately, we recently had to retrain all our major networks due to a hard drive failure. Fortunately, the resulting networks' performance is either very close to those attained by those presented in the CVPR manuscript or in some cases noticeably improved. Please see here for the attained accuracies of the shared neural listeners, or run this notebook to analyze the performance of the ChangeIt3D networks with the better-performing (pre-trained and shared) evaluation networks. If you have any questions, please do not hesitate to contact Panos Achlioptas. :exclamation:

Metrics for Evaluating Editing Modules ( :triangular_ruler: )

To run the metrics introduced in our paper (LAB, l-GD, etc.) to evaluate shape editing systems such as the ChangeIt3D, please use this Python script.

To see the expected input of this script, please read the help of its argument parsing function parse_evaluate_changeit3d_arguments.

You can customize your run to a subset of the available metrics.

Details:

Frequently Asked Questions (FAQ)

  1. [Installation] I cannot install the suggested CUDA-based Chamfer Loss

    The suggested CUDA-based implementation of Chamfer Loss (submodule) requires a ninja build system installed.

    If you do not have it you can install it like this:

    wget https://github.com/ninja-build/ninja/releases/download/v1.8.2/ninja-linux.zip
    sudo unzip ninja-linux.zip -d /usr/local/bin/
    sudo update-alternatives --install /usr/bin/ninja ninja /usr/local/bin/ninja 1 --force
    

    Please also see the Troubleshooting section of the original implementation

    If you fail to install CD you may want to try the (~10x) slower provided implementation of Chamfer in losses/nn_distance/chamfer_loss

  2. [ShapeTalk, class-names] ShapeTalk's object classes often have different names compared to their original names in repositories like ShapeNet. Why?

    For instance, the ShapeTalk object class "flowerpot aggregates objects from the class "pot" of ShapeNet, along with objects from the class "flower_pot" of ModelNet.The objects of those two classes have the same semantics despite their different names. ShapeTalk's "flowerpot" unifies them. For more information, please see the mixing code.

  3. [Redoing experiments post-CVPR] addressing our hard drive failure and how this affected our shared pre-trained models

    • a. The shared Oracle neural listener used for evaluating LAB based on DGCNN has an improved accuracy of 78.37 (instead of 76.25%) on the chair, table, and lamp classes.

    • b. The shared PointNet-based shape classifier used for evaluating Class-Distortion has an improved 90.3% accuracy (instead of 89.0%).

    • c. The latent-based neural listeners used (frozen) for training the shared ChangeIt3DNets have slightly different performances (within limits of random-seed/initialization fluctuations). The new performances are reported here.

    • d. The above changes naturally affect our metrics for evaluating the ChangeIt3DNet architectures. You can find their performance for the shared editing networks here.