Home

Awesome

D<sup>3</sup>Fields: Dynamic 3D Descriptor Fields for Zero-Shot Generalizable Robotic Manipulation

Open In Colab

Website | Paper | Colab | Doc

D<sup>3</sup>Fields: Dynamic 3D Descriptor Fields for Zero-Shot Generalizable Robotic Manipulation

<a target="_blank" href="https://wangyixuan12.github.io/">Yixuan Wang</a><sup>1*</sup>, <a target="_blank" href="https://robopil.github.io/d3fields/">Zhuoran Li</a><sup>2, 3*</sup>, <a target="_blank" href="https://robo-alex.github.io/">Mingtong Zhang</a><sup>1</sup>, <a target="_blank" href="https://ece.illinois.edu/about/directory/faculty/krdc">Katherine Driggs-Campbell</a><sup>1</sup>, <a target="_blank" href="https://jiajunwu.com/">Jiajun Wu</a><sup>2</sup>, <a target="_blank" href="https://profiles.stanford.edu/fei-fei-li">Li Fei-Fei</a><sup>2</sup>, <a target="_blank" href="https://yunzhuli.github.io/">Yunzhu Li</a><sup>1, 2</sup>

<sup>1</sup>University of Illinois Urbana-Champaign, <sup>2</sup>Stanford University, <sup>3</sup>National University of Singapore<br>

https://github.com/WangYixuan12/d3fields/assets/32333199/a3fced3d-e827-4e7e-ad6a-e80889809fca

Try it in Colab!

In this notebook, we show how to build D<sup>3</sup>Fields and visualize reconstructed mesh, mask fields, and descriptor fields. We also demonstrate how to track keypoints of a video.

Installation

We recommend Mambaforge instead of the standard anaconda distribution for faster installation:

# create conda environment
mamba env create -f env.yaml
conda activate d3fields

# download pretrained models
bash scripts/download_ckpts.sh
bash scripts/download_data.sh

Visualization

python vis_repr.py # visualize the representation
python vis_tracking.py # visualize the tracking

Code Explanation

Fusion is the core class of D<sup>3</sup>Fields. It contains the following key functions:

Customized Dataset

To run D<sup>3</sup>Fields on your own dataset, you could follow the following steps:

  1. Prepare dataset in the following structure:
dataset_name
├── camera_0
│   ├── color
|   |   ├── 0.png
|   |   ├── 1.png
|   |   ├── ...
│   ├── depth
|   |   ├── 0.png
|   |   ├── 1.png
|   |   ├── ...
│   ├── camera_extrinsics.npy
│   ├── camera_params.npy
├── camera_1
├── ...

The definition of camera_extrinsics.npy and camera_params.npy is defined as follows:

camera_extrinsics.npy: (4, 4) numpy array, the extrinsics of the camera, which transforms a point from world coordinate to camera coordinate
camera_params.npy: (4,) numpy array, the camera parameters in the following order: fx, fy, cx, cy
  1. Prepare the PCA pickle file for the query texts. Find four images of the queries texts (e.g. mug) with clean bakcground and central objects. Change obj_type within scripts/prepare_pca.py and run it.
  2. Specify the workspace boundary as x_lower, x_upper, y_lower, y_upper, z_lower, z_upper.
  3. Run python vis_repr_custom.py, such as python vis_repr_custom.py --data_path data/2023-09-15-13-21-56-171587 --pca_path pca_model/mug.pkl --query_texts mug --query_thresholds 0.3 --x_lower -0.4 --x_upper 0.4 --y_upper 0.3 --y_lower -0.4 --z_upper 0.02 --z_lower -0.2

Tips for debugging:

Citation

If you find this repo useful for your research, please consider citing the paper

@article{wang2023d3fields,
    title={D$^3$Fields: Dynamic 3D Descriptor Fields for Zero-Shot Generalizable Robotic Manipulation},
    author={Wang, Yixuan and Li, Zhuoran and Zhang, Mingtong and Driggs-Campbell, Katherine and Wu, Jiajun and Fei-Fei, Li and Li, Yunzhu},
    journal={arXiv preprint arXiv:2309.16118},
    year={2023}
}