Home

Awesome

<h1 align="left"> 2023 AI2-THOR Rearrangement Challenge </h1> <p align="left"> <a href="//github.com/allenai/ai2thor-rearrangement/blob/main/LICENSE"> <!-- ai2thor-rearrangement wasn't identifiable by GitHub (on the day this was added), so using the same one as ai2thor --> <img alt="License" src="https://img.shields.io/github/license/allenai/ai2thor.svg?color=blue"> </a> <a href="//ai2thor.allenai.org/rearrangement/" target="_blank"> <img alt="Documentation" src="https://img.shields.io/website/https/ai2thor.allenai.org?down_color=red&down_message=offline&up_message=online"> </a> <a href="//github.com/allenai/ai2thor-rearrangement/releases"> <img alt="GitHub release" src="https://img.shields.io/github/release/allenai/ai2thor-rearrangement.svg"> </a> <a href="//arxiv.org/abs/2103.16544" target="_blank"> <img src="https://img.shields.io/badge/arXiv-2103.16544-<COLOR>"> </a> <a href="//arxiv.org/abs/2103.16544" target="_blank"> <img src="https://img.shields.io/badge/venue-CVPR 2021-blue"> </a> <a href="//www.youtube.com/watch?v=1APxaOC9U-A" target="_blank"> <img src="https://img.shields.io/badge/video-YouTube-red"> </a> <a href="https://join.slack.com/t/ask-prior/shared_invite/zt-oq4z9u4i-QR3kgpeeTAymEDkNpZmCcg" target="_blank"> <img src="https://img.shields.io/badge/questions-Ask PRIOR Slack-blue"> </a> </p> <img src="https://ai2thor.allenai.org/static/4844ccdba50de95a4feff30cf2978ce5/3ba25/rearrangement-cover1.png" />

Welcome to the 2023 AI2-THOR Rearrangement Challenge hosted at the CVPR'22 Embodied-AI Workshop. The goal of this challenge is to build a model/agent that move objects in a room to restore them to a given initial configuration. Please follow the instructions below to get started.

If you have any questions please file an issue or post in the #rearrangement-challenge channel on our Ask PRIOR slack.

Contents

<!-- # To create the table of contents, move the [TOC] line outside of this comment # and then run the below Python block. [TOC] import markdown with open("README.md", "r") as f: a = markdown.markdown(f.read(), extensions=["toc"]) print(a[:a.index("</div>") + 6]) --> <div class="toc"> <ul> <li><a href="#-whats-new-in-the-2023-challenge">πŸ”₯πŸ†•πŸ”₯ What's New in the 2023 Challenge?</a></li> <li><a href="#-2022-challenge-winners-and-current-sota">βŒ›πŸ₯‡ 2022 Challenge Winners and Current SoTA</a></li> <li><a href="#-installation">πŸ’» Installation</a></li> <li><a href="#-rearrangement-task-description">πŸ“ Rearrangement Task Description</a></li> <li><a href="#-challenge-tracks-and-datasets">πŸ›€οΈ Challenge Tracks and Datasets</a><ul> <li><a href="#%EF%B8%8F%EF%B8%8F-the-1--and-2-phase-tracks">☝️+✌️ The 1- and 2-Phase Tracks</a></li> <li><a href="#-datasets">πŸ“Š Datasets</a></li> </ul> </li> <li><a href="#-submitting-to-the- ">πŸ›€οΈ Submitting to the Leaderboard</a></li> <li><a href="#-allowed-observations">πŸ–ΌοΈ Allowed Observations</a></li> <li><a href="#-allowed-actions">πŸƒ Allowed Actions</a></li> <li><a href="#-setting-up-rearrangement">🍽️ Setting up Rearrangement</a><ul> <li><a href="#%EF%B8%8F-setting-up-rearrangement">🍽️ Setting up Rearrangement</a><ul> <li><a href="#-learning-by-example">✨ Learning by example</a></li> <li><a href="#-the-rearrange-thor-environment-class">🌎 The Rearrange THOR Environment class</a></li> <li><a href="#-the-rearrange-task-sampler-class">πŸ’ The Rearrange Task Sampler class</a></li> <li><a href="#-the-walkthrough-task-and-unshuffle-task-classes">πŸšΆπŸ”€ The Walkthrough Task and Unshuffle Task classes</a></li> </ul> </li> <li><a href="#-object-poses">πŸ—ΊοΈ Object Poses</a></li> <li><a href="#-evaluation">πŸ† Evaluation</a><ul> <li><a href="#-when-are-poses-approximately-equal">πŸ“ When are poses (approximately) equal?</a></li> <li><a href="#-computing-metrics">πŸ’― Computing metrics</a></li> </ul> </li> <li><a href="#-training-baseline-models-with-allenact">πŸ‹ Training Baseline Models with AllenAct</a><ul> <li><a href="#-pretrained-models">πŸ’ͺ Pretrained Models</a></li> </ul> </li> </ul> </li> <li><a href="#-citation">πŸ“„ Citation</a></li> </ul> </div>

πŸ”₯πŸ†•πŸ”₯ What's New in the 2023 Challenge?

Our 2023 AI2-THOR Rearrangement Challenge has several upgrades distinguishing it from the 2022 version:

  1. New AI2-THOR version. We've upgraded the version of AI2-THOR we're using from 5.0.0, this brings performance improvements and bug fixes.
  2. New dataset. We've released a new rearrangement dataset to match the new AI2-THOR version. This new dataset has a more uniform balance of easy/hard episodes and requires interaction with more objects.
  3. Improved object-opening logic. In previous versions of the rearrangement challenge there was no downside to attempting to open all objects that the agent came across as the open action would only execute when opening an object that was in a state different from that during walkthrough phase. In this version of the challenge, all openable objects have open and closed states that are toggled when agent issues the open action upon them.
  4. Misc. improvements. We've fixed a number of minor bugs and performance issues from the 2022 challenge improving consistency.

βŒ›πŸ₯‡ 2022 Challenge Winners and Current SoTA

The winners of the 2022 AI2-THOR Rearrangement Challenge and current state-of-the-art include:

1-Phase Challenge Winner (Current SoTA)

Submission name: ProcTHOR + Fine-Tuning <br> % Fixed Strict (Test): 24.47% <br> Paper link: ProcTHOR: Large-Scale Embodied AI Using Procedural Generation (@NeurIPS'22) <br> Team: Matt Deitke, Eli VanderBilt, Alvaro Herrasti, Luca Weihs, Jordi Salvador, Kiana Ehsani, Winson Han, Eric Kolve, Ali Farhadi, Aniruddha Kembhavi, and Roozbeh Mottaghi <br>

2-Phase Challenge Winner

Submission name: MaSS: 3D Mapping and Semantic Search <br> % Fixed Strict (Test): 16.56% <br> Paper link: A Simple Approach for Visual Room Rearrangement: 3D Mapping and Semantic Search (@ICLR'23) <br> Codebase: https://github.com/brandontrabucco/mass <br> Team: Brandon Trabucco, Gunnar A Sigurdsson, Robinson Piramuthu, Gaurav S. Sukhatme, and Ruslan Salakhutdinov <br>

Current 2-Phase SoTA

Submission name: TIDEE + open everything <br> % Fixed Strict (Test): 28.94% <br> Paper link: TIDEE: Tidying Up Novel Rooms using Visuo-Semantic Commonsense Priors (@EECV'22) <br> Codebase: https://github.com/Gabesarch/TIDEE <br> Team: Gabriel Sarch, Zhaoyuan Fang, Adam W. Harley, Paul Schydlo, Michael Tarr, Saurabh Gupta, and Katerina Fragkiadaki <br>

πŸ’» Installation

To begin, clone this repository locally

git clone git@github.com:allenai/ai2thor-rearrangement.git
<details> <summary><b>See here for a summary of the most important files/directories in this repository</b> </summary> <p>

Here's a quick summary of the most important files/directories in this repository:

</p> </details>

You can now either install the requirements into a local python environment or use the provided Dockerfile.

Local installation

First create a python virtual environment and then install requirements by running

pip install -r requirements.txt

Or, if you prefer using conda, you can create a thor-rearrange environment with our requirements by running

export MY_ENV_NAME=thor-rearrange
export CONDA_BASE="$(dirname $(dirname "${CONDA_EXE}"))"
export PIP_SRC="${CONDA_BASE}/envs/${MY_ENV_NAME}/pipsrc"
conda env create --file environment.yml --name $MY_ENV_NAME
<details> <summary> <b> Why not just run <code>conda env create --file environment.yml --name thor-rearrange</code> by itself? </b></summary> <p>

If you were to run conda env create --file environment.yml --name thor-rearrange nothing would break but we have some pip requirements in our environment.yml file and, by default, these are saved in a local ./src directory. By explicitly specifying the PIP_SRC variable we can have it place these pip-installed packages in a nicer (more hidden) location.

</p> </details>

Docker installation

This assumes some familiarity with Docker. If you are new to Docker, we recommend reading through this tutorial.

You first need to make sure you have nvidia-docker installed on your machine. If you don't, you can install it (assuming you are running on Ubuntu) by running:

# Installing nvidia-container-toolkit
curl -s -L https://nvidia.github.io/nvidia-container-runtime/gpgkey | \
  sudo apt-key add -
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-container-runtime/$distribution/nvidia-container-runtime.list | \
  sudo tee /etc/apt/sources.list.d/nvidia-container-runtime.list
sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit-base nvidia-container-toolkit
sudo systemctl restart docker

Now cd into the ai2thor-rearrangement repository and then build the docker image by running

DOCKER_BUILDKIT=1 docker build -t rearrangement:latest .

to create a docker image with name rearrangement. Note that the Dockerfile will automatically copy the contents of the ai2thor-rearrangement repository into the docker image. You can, of course, modify the Dockerfile to copy in additional files/directories as needed or mount the ai2thor-rearrangement repository directory on the docker container (see the Docker documentation for more information) so that any changes you make to your local copy of the repository are reflected in the docker container (and vice versa).

Now to start the docker container, you can run:

docker run \
    --gpus all \
    --device /dev/dri \
    --mount type=bind,source=/usr/share/vulkan/icd.d/nvidia_icd.json,target=/etc/vulkan/icd.d/nvidia_icd.json \
    --mount type=bind,source=/usr/share/vulkan/icd.d/nvidia_layers.json,target=/etc/vulkan/implicit_layer.d/nvidia_layers.json \
    --mount type=bind,source=/usr/share/glvnd/egl_vendor.d/10_nvidia.json,target=/usr/share/glvnd/egl_vendor.d/10_nvidia.json \
    --shm-size 50G \
    -it rearrangement:latest

Please set the shared memory size (--shm-size) to something that your machine can support. Setting this too small can cause problems in multi-GPU training. Note that, importantly, we are mounting the nvidia_icd.json, nvidia_layers.json, and 10_nvidia.json files from the host machine into the docker container. This is necessary to ensure that the docker container can use the Vulkan API (which is used by AI2-THOR). The above assumes that your machine has a working Vulkan installation (modern versions of Ubuntu come with this pre-installed) and that the above files are present at the

/usr/share/vulkan/icd.d/nvidia_icd.json
/usr/share/vulkan/icd.d/nvidia_layers.json
/usr/share/glvnd/egl_vendor.d/10_nvidia.json

paths. On some machines these files may be located at different paths. If you are running into issues with the above, you can try checking to see if the above files exist instead at the paths:

/etc/vulkan/icd.d/nvidia_icd.json,target
/etc/vulkan/implicit_layer.d/nvidia_layers.json
/usr/share/glvnd/egl_vendor.d/10_nvidia.json

or use the find command to search for these files:

find / -name nvidia_icd.json
find / -name nvidia_layers.json
find / -name 10_nvidia.json

Once you find the correct paths, you'll need to then modify the above docker run command accordingly.

Now that you've successfully run the docker container, you can run the following to test that everything is working:

conda activate rearrange
export PYTHONPATH=$PYTHONPATH:$PWD
allenact -o rearrange_out -b . baseline_configs/one_phase/one_phase_rgb_resnet_dagger.py

You should see output that looks like the following

[05/23 15:47:34 INFO:] Running with args Namespace(experiment='baseline_configs/one_phase/one_phase_rgb_resnet_dagger.py', eval=False, config_kwargs=None, extra_tag='', output_dir='rearrange_out', save_dir_fmt=<SaveDirFormat.FLAT: 'FLAT'>, seed=None, experiment_base='.', checkpoint=None, infer_output_dir=False, approx_ckpt_step_interval=None, restart_pipeline=False, deterministic_cudnn=False, max_sampler_processes_per_worker=None, deterministic_agents=False, log_level='info', disable_tensorboard=False, disable_config_saving=False, collect_valid_results=False, valid_on_initial_weights=False, test_expert=False, distributed_ip_and_port='127.0.0.1:0', machine_id=0, callbacks='', enable_crash_recovery=False, test_date=None, approx_ckpt_steps_count=None, skip_checkpoints=0)	[main.py: 452]
[05/23 15:47:35 INFO:] Config files saved to rearrange_out/used_configs/OnePhaseRGBResNetDagger_40proc/2023-05-23_15-47-35	[runner.py: 865]
[05/23 15:47:35 INFO:] Using 8 train workers on devices (device(type='cuda', index=0), device(type='cuda', index=1), device(type='cuda', index=2), device(type='cuda', index=3), device(type='cuda', index=4), device(type='cuda', index=5), device(type='cuda', index=6), device(type='cuda', index=7))	[runner.py: 274]
[05/23 15:47:35 INFO:] Engines on machine_id == 0 using port 53495 and seed 137964697	[runner.py: 444]
[05/23 15:47:35 INFO:] Using local worker ids [0, 1, 2, 3, 4, 5, 6, 7] (total 8 workers in machine 0)	[runner.py: 283]
[05/23 15:47:36 INFO:] Started 8 train processes	[runner.py: 545]
[05/23 15:47:36 INFO:] Using 1 valid workers on devices (device(type='cuda', index=7),)	[runner.py: 274]
[05/23 15:47:36 INFO:] Started 1 valid processes	[runner.py: 572]
[05/23 15:47:39 INFO:] train 1 args {'experiment_name': 'OnePhaseRGBResNetDagger_40proc', 'config': <baseline_configs.one_phase.one_phase_rgb_resnet_dagger.OnePhaseRGBResNetDaggerExperimentConfig object at 0x7efd483fac70>, 'callback_sensors': [], 'results_queue': <multiprocessing.queues.Queue object at 0x7efd483facd0>, 'checkpoints_queue': <multiprocessing.queues.Queue object at 0x7efd268d5be0>, 'checkpoints_dir': 'rearrange_out/checkpoints/OnePhaseRGBResNetDagger_40proc/2023-05-23_15-47-35', 'seed': 137964697, 'deterministic_cudnn': False, 'mp_ctx': <multiprocessing.context.ForkServerContext object at 0x7efd268dd430>, 'num_workers': 8, 'device': device(type='cuda', index=1), 'distributed_ip': '127.0.0.1', 'distributed_port': 53495, 'max_sampler_processes_per_worker': None, 'save_ckpt_after_every_pipeline_stage': True, 'initial_model_state_dict': '[SUPPRESSED]', 'first_local_worker_id': 0, 'distributed_preemption_threshold': 0.7, 'try_restart_after_task_error': False, 'mode': 'train', 'worker_id': 1[runner.py: 373]
(...)
[05/23 15:53:14 INFO:] TRAIN: 22295 rollout steps ({'onpolicy': 22295}) total_loss 3.17 global_batch_size 2.48e+03 lr 0.0003 rollout_epochs 3 rollout_num_mini_batch 1 worker_batch_size 312 unshuffle/change_energy 2.49 unshuffle/end_energy 0.153 unshuffle/energy_prop 0.058 unshuffle/ep_length 34.8 unshuffle/num_broken 0 unshuffle/num_changed 2.88 unshuffle/num_fixed 2.78 unshuffle/num_initially_misplaced 2.94 unshuffle/num_misplaced 0.166 unshuffle/num_newly_misplaced 0 unshuffle/prop_fixed 0.944 unshuffle/prop_fixed_strict 0.944 unshuffle/prop_misplaced 0.0557 unshuffle/reward 2.26 unshuffle/start_energy 2.56 unshuffle/success 0.861 teacher_ratio/enforced 1 teacher_ratio/sampled 1 imitation_loss/expert_cross_entropy 3.17 elapsed_time 338s	[runner.py: 1089]
[05/23 15:56:07 INFO:] TRAIN: 44475 rollout steps ({'onpolicy': 44475}) total_loss 2.48 global_batch_size 2.47e+03 lr 0.0003 rollout_epochs 3 rollout_num_mini_batch 1 worker_batch_size 311 unshuffle/change_energy 2.48 unshuffle/end_energy 0.183 unshuffle/energy_prop 0.0617 unshuffle/ep_length 44.8 unshuffle/num_broken 0 unshuffle/num_changed 2.88 unshuffle/num_fixed 2.79 unshuffle/num_initially_misplaced 2.99 unshuffle/num_misplaced 0.197 unshuffle/num_newly_misplaced 0 unshuffle/prop_fixed 0.94 unshuffle/prop_fixed_strict 0.94 unshuffle/prop_misplaced 0.0605 unshuffle/reward 2.3 unshuffle/start_energy 2.6 unshuffle/success 0.865 teacher_ratio/enforced 1 teacher_ratio/sampled 1 imitation_loss/expert_cross_entropy 2.48 elapsed_time 173s approx_fps 128 onpolicy/approx_eps 128	[runner.py: 1089]

Note that it may take several minute before the lines with TRAIN: start appearing, this is because these only print after many thousands of steps have been taken. This can be annoying when debugging, if you'd like to print these logs more frequently you should change the metric_accumulate_interval argument to the TrainingPipeline in the baseline_configs/rearrange_base.py file to be some small integer value (e.g. metric_accumulate_interval=1)

Python 3.6+ 🐍. Each of the actions supports typing within <span class="chillMono">Python</span>.

AI2-THOR 5.0.0 🧞. To ensure reproducible results, we're restricting all users to use the exact same version of <span class="chillMono">AI2-THOR</span>.

AllenAct πŸ‹πŸ’ͺ. We ues the <span class="chillMono">AllenAct</span> reinforcement learning framework for generating baseline models, baseline training pipelines, and for several of their helpful abstractions/utilities.

πŸ“ Rearrangement Task Description

<img src="https://ai2thor.allenai.org/static/0f682c0103df1060810ad214c4668718/06655/rearrange-cover2.jpg" alt="Object Rearrangement Example" width="100%">

Overview πŸ€–. Our rearrangement task involves moving and modifying (i.e. opening/closing) randomly placed objects within a room to obtain a goal configuration. There are 2 phases:

  1. Walkthrough πŸ‘€. The agent walks around the room and observes the objects in their ideal goal state.
  2. Unshuffle πŸ‹. After the walkthrough phase, we randomly change between 1 to 5 objects in the room. The agent's goal is to identify which objects have changed and reset those objects to their state from the walkthrough phase. Changes to an object's state may include changes to its position, orientation, or openness.

πŸ›€οΈ Challenge Tracks and Datasets

☝️+✌️ The 1- and 2-Phase Tracks

As in prior years, for this 2023 challenge we have two distinct tracks:

πŸ“Š Datasets

For this challenge we have three dataset splits: "train", "val", and "test". The train split uses floor plans 1-20, 200-220, 300-320, and 400-420 within AI2-THOR, the "val" split uses floor plans 21-25, 221-225, 321-325, and 421-425, and finally the "test" split uses scenes 26-30, 226-230, 326-330, and 426-430. These dataset splits are stored as the compressed pickle-serialized files data/*.pkl.gz. While you are freely (and encouraged) to enhance the training set as you see fit, you should never train your agent within any of the test scenes.

For evaluation, your model will need to be evaluated on each of the above splits and the results submitted to our leaderboard link (see section below). As the "train" set is are quite large, we do not expect you to evaluate on their entirety. Instead we select ~1000 datapoints from each of these sets for use in evaluation. For convenience, we provide the data/combined.pkl.gz file which contains the "train", "val", and "test" datapoints that should be used for evaluation.

Split# Total Episodes# Episodes for EvalPath
train4000800data/2023/train.pkl.gz
val10001000data/2023/val.pkl.gz
test10001000data/2023/test.pkl.gz
combined28002800data/2023/combined.pkl.gz

πŸ›€οΈ Submitting to the Leaderboard

We are tracking challenge participant entries using the AI2 Leaderboard. The team with the best submission made to either of the below leaderboards by June 12th (midnight, anywhere on earth) will be announced at the CVPR'21 Embodied-AI Workshop and invited to produce a video describing their approach. Note that a winning submission must be materially different from the baseline models we provide and from submissions made to prior years' challenges.

In particular, our 2023 leaderboard links can be found at

Our older (2021/2022) leaderboards are also available indefinitely (2021 1-phase, 2021 2-phase, 2022 1-phase, 2022 2-phase). Note that our 2022/2021 challenges uses different datasets and older versions of AI2-THOR and so results will not be directly comparable.

Submissions should include your agent's trajectories for all tasks contained within the combined.pkl.gz dataset, this "combined" dataset includes tasks for the train, train_unseen, validation, and test sets. For an example as to how to iterate through all the datapoints in this dataset and save the resulting metrics in our expected submission format see here.

A (full) example the expected submission format for the 1-phase task can be found here and, for the 2-phase task, can be found here. Note that this submission format is a gzip'ed json file where the json file has the form

{
  "UNIQUE_ID_OF_TASK_0": YOUR_AGENTS_METRICS_AND_TRAJECTORY_FOR_TASK_0,
  "UNIQUE_ID_OF_TASK_1": YOUR_AGENTS_METRICS_AND_TRAJECTORY_FOR_TASK_1,
  ...
}

these metrics and unique IDs can be easily obtained when iterating over the dataset (see the above example).

Alternatively: if you run inference on the combined dataset using AllenAct (see below for more details) then you can simply (1) gzip the metrics*.json file saved when running inference, (2) rename this file submission.json.gz, and (3) submit this file to the leaderboard directly.

πŸ–ΌοΈ Allowed Observations

In both of these tracks, agents should make decisions based off of egocentric sensor readings. The types of sensors allowed/provided for this challenge include:

<p float="left"> <img src="https://ai2thor.allenai.org/static/3b1dea7228ed5c3fab03fb5f960173eb/bc8e0/rgb-frame.png" alt="POV Agent Image" width="45%"> <img src="https://ai2thor.allenai.org/static/73f2a583b1636712a7a7d165ed6d768d/d79bd/depth-frame.jpg" alt="Depth Agent Image" width="54%"> </p>
  1. RGB images - having shape 224x224x3 and an FOV of 90 degrees.
  2. Depth maps - having shape 224x224 and an FOV of 90 degrees.
  3. Perfect egomotion - We allow for agents to know precisely how far (and in which direction) they have moved as well as how many degrees they have rotated.

While you are absolutely free to use any sensor information you would like during training (e.g. pretraining your CNN using semantic segmentations from AI2-THOR or using a scene graph to compute expert actions for imitation learning) such additional sensor information should not be used at inference time.

πŸƒ Allowed Actions

A total of 82 actions are available to our agents, these include:

Navigation

Object Interaction

Done action

🍽️ Setting up Rearrangement

✨ Learning by example

See the example.py file for an example of how you can instantiate the 1- and 2-phase variants of our rearrangement task.

🌎 The Rearrange THOR Environment class

The rearrange.environment.RearrangeTHOREnvironment class provides a wrapper around the AI2-THOR environment and is designed to

  1. Make it easy to set up a AI2-THOR scene in a particular state ready for rearrangement.
  2. Provides utilities to make it easy to evaluate (see e.g. the poses and compare_poses methods) how close the current state of the environment is to the goal state.
  3. Provide an API with which the agent may interact with the environment.

πŸ’ The Rearrange Task Sampler class

You'll notice that the above RearrangeTHOREnvironment is not explicitly instantiated by the example.py script and, instead, we create rearrange.tasks.RearrangeTaskSampler objects using the TwoPhaseRGBBaseExperimentConfig.make_sampler_fn and OnePhaseRGBBaseExperimentConfig.make_sampler_fn. This is because the RearrangeTHOREnvironment is very flexible and doesn't know anything about training/validation/test datasets, the types of actions we want our agent to be restricted to use, or precisely which types of sensor observations we want to give our agents (e.g. RGB images, depth maps, etc). All of these extra details are managed by the RearrangeTaskSampler which iteratively creates new tasks for our agent to complete when calling the next_task method. During training, these new tasks can be sampled indefinitely while, during validation or testing, the tasks will only be sampled until the validation/test datasets are exhausted. This sampling is best understood by example so please go over the example.py file.

πŸšΆπŸ”€ The Walkthrough Task and Unshuffle Task classes

As described above, the RearrangeTaskSampler samples tasks for our agent to complete, these tasks correspond to instantiations of the rearrange.tasks.WalkthroughTask and rearrange.tasks.UnshuffleTask classes. For the 2-phase challenge track, the RearrangeTaskSampler will first sample a new WalkthroughTask after which it will sample a corresponding UnshuffleTask where the agent must return the objects to their poses at the start of the WalkthroughTask.

πŸ—ΊοΈ Object Poses

Accessing object poses 🧘. The poses of all objects in the environment can be accessed using the RearrangeTHOREnvironment.poses property, i.e.

unshuffle_start_poses, walkthrough_start_poses, current_poses = env.poses # where env is an RearrangeTHOREnvironment instance  

Reading an object's pose πŸ“–. Here, unshuffle_start_poses, walkthrough_start_poses, and current_poses evaluate to a list of dictionaries and are defined as:

Each dictionary contains the object's pose in a form similar to:

{
    "type": "Candle",
    "position": {
        "x": -0.3012670874595642,
        "y": 0.7431036233901978,
        "z": -2.040205240249634
    },
    "rotation": {
        "x": 2.958569288253784,
        "y": 0.027708930894732475,
        "z": 0.6745457053184509
    },
    "openness": None,
    "pickupable": True,
    "broken": False,
    "objectId": "Candle|-00.30|+00.74|-02.04",
    "name": "Candle_977f7f43",
    "parentReceptacles": [
        "Bathtub|-01.28|+00.28|-02.53"
    ],
    "bounding_box": [
        [-0.27043721079826355, 0.6975823640823364, -2.0129783153533936],
        [-0.3310248851776123, 0.696869969367981, -2.012985944747925],
        [-0.3310534358024597, 0.6999208927154541, -2.072017192840576],
        [-0.27046576142311096, 0.7006332278251648, -2.072009563446045],
        [-0.272365003824234, 0.8614493608474731, -2.0045082569122314],
        [-0.3329526484012604, 0.8607369661331177, -2.0045158863067627],
        [-0.3329811990261078, 0.8637878894805908, -2.063547134399414],
        [-0.27239352464675903, 0.8645002245903015, -2.063539505004883]
    ]
}

Matching objects across poses 🀝. Across unshuffle_start_poses, walkthrough_start_poses, and current_poses, the ith entry in each list will always correspond to the same object across each pose list. So, unshuffle_start_poses[5] will refer to the same object as walkthrough_start_poses[5] and current_poses[5]. Most scenes have around 70 objects, among which, 10 to 20 are pickupable by the agent.

Pose keys πŸ”‘.

πŸ† Evaluation

To evaluate the quality of a rearrangement agent we compute several metrics measuring how well the agent has managed to move objects so that their final poses are (approximately) equal to their goal poses.

πŸ“ When are poses (approximately) equal?

Recall that we represent the pose of an object as a combination of its:

  1. Openness πŸ“–. - A value in [0,1] which measures how far the object has been opened.
  2. Position πŸ“, Rotation πŸ™ƒ, and bounding box πŸ“¦ - The 3D position, rotation, and bounding box of each object.
  3. Broken - A boolean indicating if the object has been broken (all goal object poses are unbroken).

The openness between its goal state and predicted state is off by less than 20 percent. The openness check is only applied to objects that can open. The object's 3D bounding box from its goal pose and the predicted pose must have an IoU over 0.5. The positional check is only relevant to objects that can move.

To measure if two object poses are approximately equal we use the following criterion:

  1. ❌ If any object pose is broken.
  2. ❌ If the object is opennable but not pickupable (e.g. a cabinet) and the the openness values between the two poses differ by more than 0.2.
  3. ❌ The two 3D bounding boxes of pickupable objects have an IoU under 0.5.
  4. βœ”οΈ None of the above criteria are met so the poses are not broken, are close in openness values, and have sufficiently high IoU.

πŸ’― Computing metrics

Suppose that task is an instance of an UnshuffleTask which your agent has taken actions until reaching a terminal state (e.g. either the agent has taken the maximum number of steps or it has taken the "done" action). Then metrics regarding the agent's performance can be computed by calling the task.metrics() function. This will return a dictionary of the form

{
    "task_info": {
        "scene": "FloorPlan420",
        "index": 7,
        "stage": "train"
    },
    "ep_length": 176,
    "unshuffle/ep_length": 7,
    "unshuffle/reward": 0.5058389582634852,
    "unshuffle/start_energy": 0.5058389582634852,
    "unshuffle/end_energy": 0.0,
    "unshuffle/prop_fixed": 1.0,
    "unshuffle/prop_fixed_strict": 1.0,
    "unshuffle/num_misplaced": 0,
    "unshuffle/num_newly_misplaced": 0,
    "unshuffle/num_initially_misplaced": 1,
    "unshuffle/num_fixed": 1,
    "unshuffle/num_broken": 0,
    "unshuffle/change_energy": 0.5058464936498058,
    "unshuffle/num_changed": 1,
    "unshuffle/prop_misplaced": 0.0,
    "unshuffle/energy_prop": 0.0,
    "unshuffle/success": 0.0,
    "walkthrough/ep_length": 169,
    "walkthrough/reward": 1.82,
    "walkthrough/num_explored_xz": 17,
    "walkthrough/num_explored_xzr": 46,
    "walkthrough/prop_visited_xz": 0.5151515151515151,
    "walkthrough/prop_visited_xzr": 0.3484848484848485,
    "walkthrough/num_obj_seen": 11,
    "walkthrough/prop_obj_seen": 0.9166666666666666
}

Of the above metrics, the most important (those used for comparing models) are

πŸ‹ Training Baseline Models with AllenAct

We use the AllenAct framework for training our baseline rearrangement models, this framework is automatically installed when installing the requirements for this project.

Before running training or inference you'll first have to add the ai2thor-rearrangement directory to your PYTHONPATH (so that python and AllenAct knows where to for various modules). To do this you can run the following:

cd YOUR/PATH/TO/ai2thor-rearrangement
export PYTHONPATH=$PYTHONPATH:$PWD

Let's say you want to train a model for the 1-phase challenge. This can be easily done by running the command

allenact -o rearrange_out -b . baseline_configs/one_phase/one_phase_rgb_resnet_dagger.py 

This will train (using DAgger, a form of imitation learning) a model which uses a pretrained (with frozen weights) ResNet18 as the visual backbone that feeds into a recurrent neural network (a GRU) before producing action probabilities and a value estimate. Results from this training are then saved to rearrange_out where you can find model checkpoints, tensorboard plots, and configuration files that can be used if you, in the future, forget precisely what the details of your experiment were.

A similar model can be trained for the 2-phase challenge by running

allenact -o rearrange_out -b . baseline_configs/two_phase/two_phase_rgb_resnet_ppowalkthrough_ilunshuffle.py

πŸ’ͺ Pretrained Models

In the below table we provide a collection of pretrained models from:

  1. Our CVPR'21 paper introducing this challenge, and
  2. Our CVPR'22 paper which showed that using CLIP visual encodings can dramatically improve model performance acros embodied tasks.

We have only evaluated a subset of these models on our 2022 dataset.

Model% Fixed Strict (2023 dataset, test)% Fixed Strict (2022 dataset, test)% Fixed Strict (2021 dataset, test)Pretrained Model
1-Phase Embodied CLIP ResNet50 IL13.6%19.1%17.3%(link)
1-Phase ResNet18+ANM IL--8.9%(link)
1-Phase ResNet50 IL--7.0%(link)
1-Phase ResNet18 IL--6.3%(link)
1-Phase ResNet18 PPO--5.3%(link)
1-Phase Simple IL--4.8%(link)
1-Phase Simple PPO--4.6%(link)
2-Phase ResNet18+ANM IL+PPO-0.53%1.44%(link)
2-Phase ResNet18 IL+PPO--0.66%(link)

These models can be downloaded at from the above links and should be placed into the pretrained_model_ckpts directory. You can then, for example, run inference for the 1-Phase ResNet18 IL model using AllenAct by running:

export CURRENT_TIME=$(date '+%Y-%m-%d_%H-%M-%S') # This is just to record when you ran this inference
allenact baseline_configs/one_phase/one_phase_rgb_resnet_dagger.py \
-c pretrained_model_ckpts/exp_OnePhaseRGBResNetDagger_40proc__stage_00__steps_000050058550.pt \
--extra_tag $CURRENT_TIME \
--eval

this will evaluate this model across all datapoints in the data/combined.pkl.gz dataset which contains data from the train, val, and test sets so that evaluation doesn't have to be run on each set separately.

πŸ“„ Citation

If you use this work, please cite our CVPR'21 paper:

@InProceedings{RoomR,
  author = {Luca Weihs and Matt Deitke and Aniruddha Kembhavi and Roozbeh Mottaghi},
  title = {Visual Room Rearrangement},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  month = {June},
  year = {2021}
}