Home

Awesome

COG

This repository accompanies the following paper:

COG: Connecting New Skills to Past Experience with Offline Reinforcement Learning <br/> Avi Singh, Albert Yu, Jonathan Yang, Jesse Zhang, Aviral Kumar, Sergey Levine <br/> Conference on Robot Learning, 2020 <br/> Website | Arxiv | Video

Open drawer, take object outClose top drawer, take object outRemove obstacle, take object out

In this paper, we propose an approach to incorporate a large amount of prior data, either from previously solved tasks or from unsupervised or undirected environment interaction, to extend and generalize learned behavior. This prior data is not specific to any one task, and can be used to extend a variety of downstream skills. We train our policies in an end-to-end fashion, mapping high-dimensional image observations to low-level robot control commands.

This code is based on the original CQL implementation.

Usage

By default, all logs will be stored in cog/data/. If you would like to save to a different directory, update CUSTOM_LOG_DIR in the relevant launch script. An example command for using our method in offline mode:

python examples/cog.py --env=Widow250PickTray-v0 --max-path-length=40 --prior-buffer=pickplace_prior.npy --task-buffer=pickplace_task.npy

An example command for online finetuning from a saved checkpoint:

python examples/cog_finetune.py --checkpoint-dir=LOG_DIR --online-data-only --checkpoint-epoch=1000

An example command for running the behavior cloning baseline:

python examples/chaining_bc.py --env=Widow250PickTray-v0 --max-path-length=40 --prior-buffer=pickplace_prior.npy --task-buffer=pickplace_task.npy

The datasets mentioned above can be downloaded from this Google drive link.

Here are the exact commands to reproduce all results for our method in the paper:

python cog.py --env=Widow250DoubleDrawerOpenGraspNeutral-v0 --max-path-length=50 --prior-buffer=closed_drawer_prior.npy --task-buffer=drawer_task.npy
python cog.py --env=Widow250DoubleDrawerCloseOpenGraspNeutral-v0 --max-path-length=80 --prior-buffer=blocked_drawer_1_prior.npy --task-buffer=drawer_task.npy
python cog.py --env=Widow250DoubleDrawerPickPlaceOpenGraspNeutral-v0 --max-path-length=80 --prior-buffer=blocked_drawer_2_prior.npy --task-buffer=drawer_task.npy

Replacing cog.py with chaining_bc.py will allow reproducing experiments for the BC baseline.

Setup

Our code is based on CQL, which is in turn based on rlkit. The setup instructions are similar to rlkit, but we repeat them here for convenience:

conda env create -f environment/linux-gpu-env.yml
source activate cql-env
pip install -e .

After the above, please install roboverse and its dependencies in the same conda env.

Datasets

The datasets used in this project can be downloaded using this Google drive link.

If you would like to download the dataset on a remote machine via the command line, consider using gdown.

Known issues

TODO

High priority

Soon