Home

Awesome

Generalizable Humanoid Manipulation with Improved 3D Diffusion Policies

Our project is fully open-sourced. We separate them into two repos: Learning & Deployment of iDP3 and Humanoid Teleoperation. This repo is for training and deployment of iDP3.

https://github.com/user-attachments/assets/97f6ff8c-45b3-497a-bb66-dd8b24e973b4

News

Training & Deployment of iDP3

This repo is for training and deployment of iDP3. We provide the training data example in this Google Drive, so that you could try to train the model without collecting data. The full data and the checkpoints are available in this Google Drive.

More info:

iDP3 is a general 3D visuomotor policy for any robot. You could use iDP3 without camera calibration and point cloud segmentation. Please check our RealSense wrapper for the proposed egocentric 3D visual representation.

Installation

Install conda env and packages for both learning and deployment machines:

conda remove -n idp3 --all
conda create -n idp3 python=3.8
conda activate idp3

# for cuda >= 12.1
pip3 install torch==2.1.0 torchvision --index-url https://download.pytorch.org/whl/cu121
# else, 
# just install the torch version that matches your cuda version



# install my visualizer
cd third_party
cd visualizer && pip install -e . && cd ..
pip install kaleido plotly open3d tyro termcolor h5py
cd ..


# install 3d diffusion policy
pip install --no-cache-dir wandb ipdb gpustat visdom notebook mediapy torch_geometric natsort scikit-video easydict pandas moviepy imageio imageio-ffmpeg termcolor av open3d dm_control dill==0.3.5.1 hydra-core==1.2.0 einops==0.4.1 diffusers==0.11.1 zarr==2.12.0 numba==0.56.4 pygame==2.1.2 shapely==1.8.4 tensorboard==2.10.1 tensorboardx==2.5.1 absl-py==0.13.0 pyparsing==2.4.7 jupyterlab==3.0.14 scikit-image yapf==0.31.0 opencv-python==4.5.3.56 psutil av matplotlib setuptools==59.5.0

cd Improved-3D-Diffusion-Policy
pip install -e .
cd ..

# install for diffusion policy if you want to use image-based policy
pip install timm==0.9.7

# install for r3m if you want to use image-based policy
cd third_party/r3m
pip install -e .
cd ../..

[Install on Deployment Machine] Install realsense package for deploy:

# first, install realsense driver
# check this version for RealSenseL515: https://github.com/IntelRealSense/librealsense/releases/tag/v2.54.2

# also install python api
pip install pyrealsense2==2.54.2.5684

Usage

We provide the training data example in Google Drive, so that you could try to train the model without collecting data. Download it and unzip it. Then specify the dataset path in scripts/train_policy.sh.

For example, I put the dataset in /home/ze/projects/Improved-3D-Diffusion-Policy/training_data_example, and I set dataset_path=/home/ze/projects/Improved-3D-Diffusion-Policy/training_data_example in scripts/train_policy.sh.

Then you could train the policy and deploy it.

Train. The script to train policy:

# 3d policy
bash scripts/train_policy.sh idp3 gr1_dex-3d 0913_example

# 2d policy
bash scripts/train_policy.sh dp_224x224_r3m gr1_dex-image 0913_example

Deploy. After you have trained the policy, deploy the policy with the following command. For missing packages such as communication.py, see another our repo

# 3d policy
bash scripts/deploy_policy.sh idp3 gr1_dex-3d 0913_example

# 2d policy
bash scripts/deploy_policy.sh dp_224x224_r3m gr1_dex-image 0913_example

Note that you may not run the deployment code without a robot (differet robots have different API). The code we provide is more like an example to show how to deploy the policy. You could modify the code to fit your own robot (any robot with a camera is OK).

Visualize. You can visualize our training data example by running (remember to set the dataset path):

bash scripts/vis_dataset.sh

You can specify vis_cloud=1 to render the point cloud as in the paper.

BibTeX

Please consider citing our paper if you find this repo useful:

@article{ze2024humanoid_manipulation,
  title   = {Generalizable Humanoid Manipulation with Improved 3D Diffusion Policies},
  author  = {Yanjie Ze and Zixuan Chen and Wenhao Wang and Tianyi Chen and Xialin He and Ying Yuan and Xue Bin Peng and Jiajun Wu},
  year    = {2024},
  journal = {arXiv preprint arXiv:2410.10803}
}

Acknowledgement

We thank the authors of the following repos for their great work: 3D Diffusion Policy, Diffusion Policy, VisionProTeleop, Open-TeleVision.