Home

Awesome

Rewards from Human Videos

Learn agent- and domain-agnostic reward functions from human videos that can be adapted to various robots and environments.

Setup for DVD reproduction

conda env create -f conda_env_setup.yml
conda activate dvd_t2t
cd metaworld
pip install -e .

cd tensor2tensor
pip install -e .

cd dvd/sim_envs
pip install -e .

Reproducing DVD

For details as to what the commands do/what the arguments are, refer to the original repo.

cd dvd
python train.py --num_tasks 6 --traj_length 0 --log_dir path/to/train/model/output --similarity --batch_size 24 --im_size 120 --seed 0 --lr 0.01 --pretrained --human_data_dir path/to/smthsmth/sm/20bn-something-something-v2 --sim_dir demos/ --human_tasks 5 41 44 46 93 94 --robot_tasks 5 41 93 --add_demos 60 --gpus 0
python collect_data.py --xml env1 --task_num 94

Testing learned reward function

Using collect_data script above, we can generate sample trajectories in the env directory. We can evaluate the rewards for each of these trajectories against a demo video and get the average reward.

python reward_inference.py --eval_path data/file/from/collect_data/script --demo_path path/to/demo

Run inference with human demos on DVD tasks:

python cem_plan_open_loop.py --num_tasks 2 --task_id 5 --dvd --demo_path demos/task5 --checkpoint /path/to/discriminator/model

Run inference using ground truth (my engineered) rewards:

python cem_plan_open_loop.py --num_tasks 2 --task_id 5 --engineered_rewards

Adding State and Visual Dynamics model

State dynamics model using PETS:

conda activate dvd_pets
cd dvd
git checkout state_history
python cem_plan_learned_dynamics.py --task_id 5 --engineered_rewards --learn_dynamics_model

OR 

git checkout visual_dynamics
python cem_plan_state_dynamics.py --task_id 5 --engineered_rewards --learn_dynamics_model

Training visual dynamics model using pydreamer

conda activate dvd_pydreamer
cd pydreamer
git checkout visual_dynamics

CUDA_VISIBLE_DEVICES=0,1 python train.py --configs defaults tabletop --run_name tabletop

Inference with CEM closed loop using visual dynamics model

cd dvd
[ADD HERE]

Preparing inpainted data

One can inpaint using the data_inpaint.py script as follows:

conda activate e2fgvi
cd dvd

python data_inpaint.py --human_data_dir /path/to/smthsmth/sm --human_tasks 5 41 94 --opts MODEL.WEIGHTS detectron2://COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x/137849600/model_final_f10217.pkl

To do this with EgoHOS segmentations instead of Detectron segmentations:

conda activate dvd_e2fgvi_detectron_egohos
cd dvd

python data_inpaint_egohos.py --human_data_dir /path/to/smthsmth/sm --human_tasks 5 41 94

Training and inference on human-only inpainted data

conda activate dvd_t2t
cd dvd

python train.py --num_tasks 6 --traj_length 0 --log_dir path/to/train/model/output --similarity --batch_size 24 --im_size 120 --seed 0 --lr 0.01 --pretrained --human_data_dir path/to/smthsmth/sm/20bn-something-something-v2 --human_tasks 5 41 44 46 93 94 --add_demos 0 --inpaint --gpus 0
conda activate dvd_e2fgvi_detectron
cd dvd

python cem_plan_inpaint.py --task_id 5 --dvd --demo_path demos/task5 --checkpoint /path/to/trained/reward/model
conda activate dvd_e2fgvi_detectron_egohos
cd dvd

python cem_plan_inpaint_egohos.py --task_id 5 --dvd --demo_path demos/task5 --checkpoint /path/to/trained/reward/model

Troubleshooting

For training, make sure the following versions of protobuf and pillow are installed.

pip install protobuf==3.9.2 pillow==6.1.0