Awesome
Visual Cortex and CortexBench
<p align="center"> <img src="res/img/vc1_teaser.gif" alt="Visual Cortex and CortexBench" width="600"> <br /> <br /> <a href="https://opensource.fb.com/support-ukraine"><img alt="Support Ukraine" src="https://img.shields.io/badge/Support-Ukraine-FFD500?style=flat&labelColor=005BBB" /></a> <a href="./MODEL_CARD.md"><img alt="Model Card" src="https://img.shields.io/badge/model--card-VC--1-green.svg" /></a> <a href="./LICENSE"><img alt="CC-BY-NC License" src="https://img.shields.io/badge/license-CC--BY--NC-blue.svg" /></a> <a href="Python 3.8"><img alt="Python 3.8" src="https://img.shields.io/badge/python-3.8-blue.svg" /></a> <a href="https://github.com/psf/black"><img alt="Code style: black" src="https://img.shields.io/badge/code%20style-black-000000.svg" /></a> <a href="https://app.circleci.com/pipelines/github/facebookresearch/eai-vc/"><img alt="CicleCI Status" src="https://dl.circleci.com/status-badge/img/gh/facebookresearch/eai-vc/tree/main.svg?style=shield&circle-token=dbbc3a068a155612bcafee8483cac9bf0dda1231" /></a> </p>We're releasing CortexBench and our first Visual Cortex model: VC-1. CortexBench is a collection of 17 different EAI tasks spanning locomotion, navigation, dexterous and mobile manipulation. We performed the largest and most comprehensive empirical study of pre-trained visual representations (PVRs) for Embodied AI (EAI), and find that none of the existing PVRs perform well across all tasks. Next, we trained VC-1 on a combination of over 4,000 hours of egocentric videos from 7 different sources and ImageNet, totaling over 5.6 million images. We show that when adapting VC-1 (through task-specific losses or a small amount of in-domain data), VC-1 is competitive with or outperforms state of the art on all benchmark tasks.
Open-Sourced Models
We're open-sourcing two visual cortex models (model cards):
- VC-1 (ViT-L): Our best model, uses a ViT-L backbone, also known simply as
VC-1
| Download - VC-1-base (VIT-B): pre-trained on the same data as VC-1 but with a smaller backbone (ViT-B) | Download
Installation
To install our visual cortex models and CortexBench, please follow the instructions in INSTALLATION.md.
Directory structure
vc_models
: contains config files for visual cortex models, the model loading code and, as well as some project utilities.- See README for more details.
cortexbench
: embodied AI downstream tasks to evaluate pre-trained representations.third_party
: Third party submodules which aren't expected to change often.data
: Gitignored directory, needs to be created by the user. Is used by some downstream tasks to find (symlinks to) datasets, models, etc.
Load VC-1
To use the VC-1 model, you can install the vc_models
module with pip. Then, you can load the model with code such as the following or follow our tutorial:
import vc_models
from vc_models.models.vit import model_utils
model,embd_size,model_transforms,model_info = model_utils.load_model(model_utils.VC1_LARGE_NAME)
# To use the smaller VC-1-base model use model_utils.VC1_BASE_NAME.
# The img loaded should be Bx3x250x250
img = your_function_here ...
# Output will be of size Bx3x224x224
transformed_img = model_transforms(img)
# Embedding will be 1x768
embedding = model(transformed_img)
Reproducing Results with VC-1 Model
To reproduce the results with the VC-1 model, please follow the README instructions for each of the benchmarks in cortexbench
.
Load Your Own Encoder Model and Run Across All Benchmarks
To load your own encoder model and run it across all benchmarks, follow these steps:
- Create a configuration for your model
<your_model>.yaml
in the model configs folder of thevc_models
module. - In the config, you can specify the custom methods (as
_target_
field) for loading your encoder model. - Then, you can load the model as follows:
import vc_models from vc_models.models.vit import model_utils model, embd_size, model_transforms, model_info = model_utils.load_model(<your_model>)
- To run the CortexBench evaluation for your model, specify your model config as a parameter (
embedding=<your_model>
) for each of the benchmarks incortexbench
.
Contributing
If you would like to contribute to Visual Cortex and CortexBench, please see CONTRIBUTING.md.
Citing Visual Cortex
If you use Visual Cortex in your research, please cite the following paper:
@inproceedings{vc2023,
title={Where are we in the search for an Artificial Visual Cortex for Embodied Intelligence?},
author={Arjun Majumdar and Karmesh Yadav and Sergio Arnaud and Yecheng Jason Ma and Claire Chen and Sneha Silwal and Aryan Jain and Vincent-Pierre Berges and Pieter Abbeel and Jitendra Malik and Dhruv Batra and Yixin Lin and Oleksandr Maksymets and Aravind Rajeswaran and Franziska Meier},
year={2023},
eprint={2303.18240},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
License
The majority of Visual Cortex and CortexBench code is licensed under CC-BY-NC (see the LICENSE file for details), however portions of the project are available under separate license terms: trifinger_simulation is licensed under the BSD 3.0 license; mj_envs, mjrl are licensed under the Apache 2.0 license; Habitat Lab, dmc2gym, mujoco-py are licensed under the MIT license.
The trained policies models and the task datasets are considered data derived from the correspondent scene datasets.
- Matterport3D based task datasets and trained models are distributed with Matterport3D Terms of Use and under CC BY-NC-SA 3.0 US license.
- Gibson based task datasets, the code for generating such datasets, and trained models are distributed with Gibson Terms of Use and under CC BY-NC-SA 3.0 US license.