Awesome
VIOLA: Imitation Learning for Vision-Based Manipulation with Object Proposal Priors
<p align="center"> <img src="./imgs/pull_figure.png" height="100%"> </p>Yifeng Zhu, Abhishek Joshi, Peter Stone, Yuke Zhu
Project | Paper | Simulation Datasets | Real-Robot Datasets | Real Robot Control
<!-- | [arxiv](http://arxiv.org/abs/2109.13841) -->Introduction
We introduce VIOLA, an object-centric imitation learning approach to learning closed-loop visuomotor policies for robot manipulation. Our approach constructs object-centric representations based on general object proposals from a pre-trained vision model. It uses a transformer-based policy to reason over these representations and attends to the task-relevant visual factors for action prediction. Such object-based structural priors improve deep imitation learning algorithm’s robustness against object variations and environmental perturbations. We quanti- tatively evaluate VIOLA in simulation and on real robots. VIOLA outperforms the state-of-the-art imitation learning methods by 45.8% in success rates. It has also been deployed successfully on a physical robot to solve challenging long- horizon tasks, such as dining table arrangements and coffee making. More videos and model details can be found in supplementary materials and the anonymous project website: https://ut-austin-rpl.github.io/VIOLA.
Real Robot Usage
This codebase does not include the real robot experiment setup. If you are interested in using the real robot control infra we use, please checkout Deoxys! It comes with a detailed documentation for getting started.
Installation
Git clone the repo by:
git clone --recurse-submodules git@github.com:UT-Austin-RPL/VIOLA.git
Then go into VIOLA/third_party
, install each dependencies according
to their instructions: detectron2, Detic
Then install all the other dependencies. Most important packages are:
torch
, robosuite
and robomimic
.
pip install -r requirements.txt
Usage
Collect demonstrations and dataset creation
We by default assume the dataset is collected through spacemouse teleoperation.
python data_generation/collect_demo.py --controller OSC_POSITION --num-demonstration 100 --environment stack-two-types --pos-sensitivity 1.5 --rot-sensitivity 1.5
Then create dataset from a data collection hdf5 file.
python data_generation/create_dataset.py --use-actions
--use-camera-obs --dataset-name training_set --demo-file PATH_TO_DEMONSTRATION_DATA/demo.hdf5 --domain-name stack-two-types
Augment datasets with color augmentations and object proposals
Add color augmentation to the original dataset:
python data_generation/aug_post_processing.py --dataset-folder DATASET_FOLDER_NAME
Then we generate general object proposals using Detic models:
python data_generation/process_data_w_proposals.py --nms 0.05
Training and evaluation
To train a policy model with our generated dataset, run
python viola_bc/exp.py experiment=stack_viola ++hdf5_cache_mode="low_dim"
And for evaluation, run
python viola_bc/final_eval_script.py --state-dir checkpoints/stack --eval-horizon 1000 --hostname ./ --topk 20 --task-name normal
Dataset and trained checkpoints
We also make the datasets we used in our paper publicly available. You can download them:
Datasets:
Used datasets: datasets, and unzip it under the folder and rename
the folder's name to be datasets
. Note that our simulation datasets are collected with robosuite v1.3.0, so the textures of robots, robots, and floors in datasets will not match robosuite v1.4.0.
Checkpoints:
Best checkpoint performance: checkpoints
unziip it under the root folder of the repo and rename it to be results
.