Awesome
Inverse Prefrence Learning: Preference-based RL without a Reward Function
Code for Inverse Prefrence Learning: Preference-based RL without a Reward Function by Joey Hejna and Dorsa Sadigh.
This repository is based on a frozen version of research-lightning. For detailed information about how to use the repository, see the research lightning repository.
Installation
Complete the following steps:
- Clone the repository to your desired location using
git clone
. - Create the conda environment using
conda env create -f environment_<cpu or gpu>.yaml
. - Install the repository research package via
pip install -e research
. - Modify the
setup_shell.sh
script by updating the appropriate values as needed. Thesetup_shell.sh
script should load the environment, move the shell to the repository directory, and additionally setup any external dependencies. All the required flags should be at the top of the file. This is necesary for support with the SLURM launcher, which we use for all experiments.
When using the repository, you should be able to setup the environment by running . path/to/setup_shell.sh
.
Some experiments require extra setup steps, which we detail below.
Robomimic Tasks
In order to run the robomimic experiments, you need to install the robomimic package and the robosuite package. We install these dependencies in the following manner:
Robosuite:
- Git clone the robosuite repository, found here.
- Checkout the
offline_study
branch - install the package to the conda environment without dependencies via
pip install -e . --no-dependencies
. - Relevant as of 2/14/2023: Robosuite has not updated their package to be compatible with Python 3.10. Change
from collections import Iterable
tofrom collections.abc import Iterable
inrobosuite/models/arenas/multi_table_arena.py
androbosuite/utils/placement_samplers.py
. - In the python interpreter, run
import robosuite
until it completes. If any errors show up, install the missing packages via pip.
Robomimic:
- Git clone the robomimic repository, found here.
- Install the package to the conda environment without dependencies via
pip install -e . --no-dependencies
. - In the python interpreter, run
import robomimic
until it completes. If any errors show up, install the missing packages via pip. - Download the datasets per the instructions here
Finally, make sure to edit the files in configs/robomimic
to correctly point to the download locations of the dataset.
MetaWorld Setup
By default, the MetaWorld benchmark has a discrepancy in environment naming. For dataset creation, we modify this to better support automatically getting oracle policies from the environment name. Specifically, we modify miniconda3/envs/<env name>/lib/python3.8/site-packages/metaworld/envs/mujoco/env_dict.py
to have ('button-press-topdown-v2', SawyerButtonPressTopdownEnvV2)
.
Usage
You should be able to activate the development enviornment by running . path/to/setup_shell.sh
. This is the same environment that will be activated when running jobs locally or on SLURM.
To train a model, simply run python scripts/train.py --config path/to/config --path path/to/save/folder
All experiments in the paper can be replicated using the .json
sweep files. To run a sweep (files ending in .json
), use the scripts in the tools
folder for running locally or on SLURM. Our experiments were run on SLURM.
To launch any job via a script in the tools
folder specify the config and path by adding the argument --arguments config=path/to/config path=path/to/save/folder
.
Local Jobs
Launching jobs locally can easily be done by specifying --valid-cpus
, which system CPUs to use, in the same manner you would for taskset
and --valid-gpus
via nvidia devices in the same way you would set CUDA_VISIBLE_DEVICES
. Then, specify the number of CPUs and GPUs per job with --cpus
and --gpus
. Note that multi-gpu training is not yet supported. The script will automatically balance jobs on the provided hardware. Here is an example:
python tools/run_local.py scripts/my_custom_script.py --valid-cpus 0-8 --valid-gpus 0 1 --cpus 2 --gpus 1 --arguments config=path/to/config path=path/to/save/folder`
This will run one job on cpu core range 0-2 with GPU 0 and one job on cpu core range 2-4 with GPU 1.
SLURM
Launching jobs on SLURM is done via the tools/run_slurm.py
script. In addition to the base arguments for job launching, the slurm script takes several additional arguments for slurm. Here is an example command that includes all of the required arguments and launches training jobs from scripts/train.py
. Additional optional arguments can be found in the tools/run_slurm.py
file.
python tools/run_slurm.py --partition <partition> --cpus 2 --gpus 1080ti:1 --mem 8G --job-name example --arguments config=path/to/config path=path/to/save/
The gpu
argument takes in the GRES specification of the GPU resource. One unfortunate problem with GRES it doesn't allow sharing GPUs between slurm jobs. The --jobs-per-instance
argument allows you to train multiple models on a single SLURM allocation in parallel on the same GPU. You just need to make sure to specify enough CPU and memory resources to run both at once.