

Diffusion Reward

[Project Website] [Paper] [Data] [Models]

This is the official PyTorch implementation of the paper "Diffusion Reward: Learning Rewards via Conditional Video Diffusion" by

Tao Huang<sup>*</sup>, Guangqi Jiang<sup>*</sup>, Yanjie Ze, Huazhe Xu.

<p align="left"> <img width="99%" src="docs/diffusion_reward_overview.png"> </p>

🛠️ Installation Instructions

Clone this repository.

git clone https://github.com/TaoHuang13/diffusion_reward.git
cd diffusion_reward

Create a virtual environment.

conda env create -f conda_env.yml 
conda activate diffusion_reward
pip install -e .

Install extra dependencies.

pip3 install torch==1.12.1+cu116 torchvision==0.13.1+cu116 --extra-index-url https://download.pytorch.org/whl/cu116
cd env_dependencies
pip install -e mj_envs/.
pip install -e mjrl/.
cd ..

💻 Reproducing Experimental Results

Download Video Demonstrations


You can download the datasets and place them to /video_dataset to reproduce the results in this paper.

Pretrain Reward Models

Train VQGAN encoder.

bash scripts/run/codec_model/vqgan_${domain}.sh    # [adroit, metaworld]

Train video models.

bash scripts/run/video_model/${video_model}_${domain}.sh    # [vqdiffusion, videogpt]_[adroit, metaworld]

(Optinal) Download Pre-trained Models

We also provide the pre-trained reward models (including Diffusion Reward and VIPER) used in this paper for result reproduction. You may download the models with configuration files here, and place the folders in /exp_local.

Train RL with Pre-trained Rewards

Train DrQv2 with different rewards.

bash scripts/run/rl/drqv2_${domain}_${reward}.sh ${task}    # [adroit, metaworld]_[diffusion_reward, viper, viper_std, amp, rnd, raw_sparse_reward]

Notice that you should login wandb for logging experiments online. Turn it off, if you aim to log locally, in configuration file here.

🧭 Code Navigation

  |- configs               # experiment configs 
  |    |- models           # configs of codec models and video models
  |    |- rl               # configs of rl 
  |- envs                  # envrionments, wrappers, env maker
  |    |- adroit.py        # Adroit env
  |    |- metaworld.py     # MetaWorld env
  |    |- wrapper.py       # env wrapper and utils
  |- models                # implements core codec models and video models
  |    |- codec_models     # image encoder, e.g., VQGAN
  |    |- video_models     # video prediction models, e.g., VQDiffusion and VideoGPT
  |    |- reward_models    # reward models, e.g., Diffusion Reward and VIPER
  |- rl                    # implements core rl algorithms

✉️ Contact

For any questions, please feel free to email taou.cs13@gmail.com or luccachiang@gmail.com.

🙏 Acknowledgement

Our code is built upon VQGAN, VQ-Diffusion, VIPER, AMP, RND, and DrQv2. We thank all these authors for their nicely open sourced code and their great contributions to the community.

🏷️ License

This repository is released under the MIT license. See LICENSE for additional details.

📝 Citation

If you find our work useful, please consider citing:

  title={Diffusion Reward: Learning Rewards via Conditional Video Diffusion},
  author={Tao Huang and Guangqi Jiang and Yanjie Ze and Huazhe Xu},