Home

Awesome

RoboDesk

PyPI

A Multi-Task Reinforcement Learning Benchmark

Robodesk Banner

If you find this open source release useful, please reference in your paper:

@misc{kannan2021robodesk,
  author = {Harini Kannan and Danijar Hafner and Chelsea Finn and Dumitru Erhan},
  title = {RoboDesk: A Multi-Task Reinforcement Learning Benchmark},
  year = {2021},
  howpublished = {\url{https://github.com/google-research/robodesk}},
}

Highlights

Training Agents

Installation: pip3 install -U robodesk

The environment follows the OpenAI Gym interface:

import robodesk

env = robodesk.RoboDesk(seed=0)
obs = env.reset()
assert obs.shape == (64, 64, 3)

done = False
while not done:
  action = env.action_space.sample()
  obs, reward, done, info = env.step(action)

Tasks

Robodesk Tasks

The behaviors above were learned using the Dreamer agent. These policies have been learned from scratch and only from pixels, not proprioceptive states.

TaskDescription
open_slidePush the sliding door all the way to the right, navigating around the other objects.
open_drawerPull the dark brown drawer all the way open.
push_greenPush the green button to turn the green light on.
stack_blocksStack the upright blue block on top of the flat green block.
upright_block_off_tablePush the blue upright block off the table.
flat_block_in_binPush the green flat block into the blue bin.
flat_block_in_shelfPush the green flat block into the shelf, navigating around the other blocks.
lift_upright_blockGrasp the blue upright block and lift it above the table.
lift_ballGrasp the magenta ball and lift it above the table.

Environment Details

Constructor

robodesk.RoboDesk(task='open_slide', reward='dense', action_repeat=1, episode_length=500, image_size=64)
ParameterDescription
taskAvailable tasks are open_slide, open_drawer, push_green, stack, upright_block_off_table, flat_block_in_bin, flat_block_in_shelf, lift_upright_block, lift_ball.
rewardAvailable reward types are dense, sparse, success. Success gives only the first sparse reward during the episode, useful for computing success rates during evaluation.
action_repeatReduces the control frequency by applying each action multiple times. This is faster than using an environment wrapper because only the needed images are rendered.
episode_lengthTime limit for the episode, can be None.
image_sizeSize of the image observations in pixels, used for both height and width.

Reward

All rewards are bound between 0 and 1. There are three types of rewards available:

Termination

Episodes end after 500 time steps by default. There are no early terminations.

Observation Space

Each observation is a dictionary that contains the current image, as well as additional information. For the standard benchmark, only the image should be used for learning. The observation dictionary contains the following keys:

KeySpace
imageBox(0, 255, (64, 64, 3), np.uint8)
qpos_robotBox(-np.inf, np.inf, (9,), np.float32)
qvel_robotBox(-np.inf, np.inf, (9,), np.float32)
qpos_objectsBox(-np.inf, np.inf, (26,), np.float32)
qvel_objectsBox(-np.inf, np.inf, (26,), np.float32)
end_effectorBox(-np.inf, np.inf, (3,), np.float32)

Action Space

RoboDesk uses end effector control with a simple bounded action space:

Box(-1, 1, (5,), np.float32)

Acknowledgements

We thank Ben Eysenbach and Debidatta Dwibedi for their helpful feedback.

Our benchmark builds upon previously open-sourced work. We build upon the desk XMLs first introduced in [1], the Franka XMLs open-sourced in [2], and the Franka meshes open-sourced in [3].

Questions

Please open an issue on Github.

Disclaimer: This is not an official Google product.