Awesome
VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models
[Project Page] [Paper] [Video]
Wenlong Huang<sup>1</sup>, Chen Wang<sup>1</sup>, Ruohan Zhang<sup>1</sup>, Yunzhu Li<sup>1,2</sup>, Jiajun Wu<sup>1</sup>, Li Fei-Fei<sup>1</sup>
<sup>1</sup>Stanford University, <sup>2</sup>University of Illinois Urbana-Champaign
<img src="media/teaser.gif" width="550">This is the official demo code for VoxPoser, a method that uses large language models and vision-language models to zero-shot synthesize trajectories for manipulation tasks.
In this repo, we provide the implementation of VoxPoser in RLBench as its task diversity best resembles our real-world setup. Note that VoxPoser is a zero-shot method that does not require any training data. Therefore, the main purpose of this repo is to provide a demo implementation rather than an evaluation benchmark.
If you find this work useful in your research, please cite using the following BibTeX:
@article{huang2023voxposer,
title={VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models},
author={Huang, Wenlong and Wang, Chen and Zhang, Ruohan and Li, Yunzhu and Wu, Jiajun and Fei-Fei, Li},
journal={arXiv preprint arXiv:2307.05973},
year={2023}
}
Setup Instructions
Note that this codebase is best run with a display. For running in headless mode, refer to the instructions in RLBench.
- Create a conda environment:
conda create -n voxposer-env python=3.9
conda activate voxposer-env
-
See Instructions to install PyRep and RLBench (Note: install these inside the created conda environment).
-
Install other dependencies:
pip install -r requirements.txt
- Obtain an OpenAI API key, and put it inside the first cell of the demo notebook.
Running Demo
Demo code is at src/playground.ipynb
. Instructions can be found in the notebook.
Code Structure
Core to VoxPoser:
playground.ipynb
: Playground for VoxPoser.LMP.py
: Implementation of Language Model Programs (LMPs) that recursively generates code to decompose instructions and compose value maps for each sub-task.interfaces.py
: Interface that provides necessary APIs for language models (i.e., LMPs) to operate in voxel space and to invoke motion planner.planners.py
: Implementation of a greedy planner that plans a trajectory (represented as a series of waypoints) for an entity/movable given a value map.controllers.py
: Given a waypoint for an entity/movable, the controller applies (a series of) robot actions to achieve the waypoint.dynamics_models.py
: Environment dynamics model for the case where entity/movable is an object or object part. This is used incontrollers.py
to perform MPC.prompts/rlbench
: Prompts used by the different Language Model Programs (LMPs) in VoxPoser.
Environment and utilities:
envs
:rlbench_env.py
: Wrapper of RLBench env to expose useful functions for VoxPoser.task_object_names.json
: Mapping of object names exposed to VoxPoser and their corresponding scene object names for each individual task.
configs/rlbench_config.yaml
: Config file for all the involved modules in RLBench environment.arguments.py
: Argument parser for the config file.LLM_cache.py
: Caching of language model outputs that writes to disk to save cost and time.utils.py
: Utility functions.visualizers.py
: A Plotly-based visualizer for value maps and planned trajectories.
Acknowledgments
- Environment is based on RLBench.
- Implementation of Language Model Programs (LMPs) is based on Code as Policies.
- Some code snippets are from Where2Act.
- Additional acknowledgement to GitHub Copilot and GPT-4 for collaboratively writing a significant portion of the code in this codebase.