Awesome

Imitation Learning via Differentiable Physics

Existing imitation learning (IL) methods such as inverse reinforcement learning (IRL) usually have a double-loop training process, alternating between learning a reward function and a policy and tend to suffer long training time and high variance. In this work, we identify the benefits of differentiable physics simulators and propose a new IL method, i.e., Imitation Learning via Differentiable Physics (ILD), which gets rid of the double-loop design and achieves significant improvements in final performance, convergence speed, and stability.

[paper] [code]

Brax MuJoCo Tasks

Our ILD agent learns using a single expert demonstration with much less variance and higher performance.

results

Expert Demo	Learned Policy

Cloth Manipulation Task

We collect a single expert demonstration in a noise-free environment. Despite the presence of severe control noise in the test environment, our method completes the task and recovers the expert behavior.

Expert Demo (Noise-free)	Learned Policy (Heavy Noise in Control)

Installation

conda create -n ILD python==3.8
conda activate ILD

pip install --upgrade pip
pip install --upgrade "jax[cuda]" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html
pip install brax
pip install streamlit
pip install tensorflow
pip install open3d

Start training

# train with a single demonstration
cd policy/brax_task
CUDA_VISIBLE_DEVICES=0 python train_on_policy.py --env="ant" --seed=1

# train with multiple demonstrations
cd policy/brax_task
CUDA_VISIBLE_DEVICES=0 python train_multi_traj.py --env="ant" --seed=1

# train with cloth manipulation task
cd policy/cloth_task
CUDA_VISIBLE_DEVICES=0 python train_on_policy.py --seed=1