Awesome

<div align="center"> <img src="doc/crtnet.png" alt="BigPictureNet"> <h4>The Context-aware Recognition Transformer Network</h4>

<a href="#about">About</a> • <a href="#crtnet-model">CRTNet Model</a> • <a href="#code-architecture">Code Architecture</a> • <a href="#datasets">Datasets</a> • <a href="#mturk-experiments">Mturk Experiments</a> • <a href="#citation">Citation</a> • <a href="#notes">Notes</a> • <a href="#license">License</a>

</div>

About

This repository contains an implementation of our paper published in ICCV 2021.

Free access to our manuscript HERE, supplementary material HERE, video presentation HERE

Conventional object recognition models are designed for images that are focused on a single object. While it is of course always possible to crop a large image to an object of interest, a lot of potentially valuable contextual information is sacrificed in that process. As our experiments show, humans are able to make use of additional context to reason about the object of interest and achieve considerably higher recognition performance.

Our Context-aware Recognition Transfomer (CRTNet) is designed to shrink this gap between human and computer vision capabilities by looking at the big picture and leveraging the contextual information.

CRTNet Model

CRTNet is presented with an image containing multiple objects and a bounding box to indicate the target object location. Inspired by the eccentricity dependence of human vision, CRTNet has one stream that processes only the target object (I<sub>t</sub> , 224 × 224), and a second stream devoted to the periphery (I<sub>c</sub> , 224 × 224). I<sub>t</sub> is obtained by cropping the input image to the bounding box whereas I<sub>c</sub> covers the entire contextual area of the image. I<sub>c</sub> and I<sub>t</sub> are then resized to the same dimensions. Thus, the target object’s resolution is higher in I<sub>t</sub> . The two streams are encoded through two separate 2D-CNNs. After the encoding stage, CRTNet tokenizes the feature maps of I<sub>t</sub> and I<sub>c</sub> , integrates object and context information via hierarchical reasoning through a stack of transformer decoder layers, and predicts class label probabilities y<sub>t,c</sub> within C classes.

A model that always relies on context can make mistakes under unusual context. To increase robustness, CRTNet makes a second prediction y<sub>t</sub> , based on target object information alone, estimates the confidence p of this prediction, and computes a confidence-weighted average of y<sub>t</sub> and y<sub>t,c</sub> to get the final prediction y<sub>p</sub> . If the model makes a confident prediction with the object only, it can overrule the context reasoning stage.

Code Architecture

All relevant components are implemented in core/.
We use COCO-style annotations for train and test sets. An example can be found in the debug/ folder.
Training and testing can be performed with train.py and test.py respectively. Annotations, image directory and relevant parameters should be set via command line arguments. Available command line arguments can be displayed by running python train.py --help and python test.py --help respectively.

Examples

Train model with default settings using the specified annotations and images. Outputs including a config and model checkpoints are saved to the directory specified via --outdir.

python train.py --annotations debug/annotations.json --imagedir debug/images --outdir output

Test a trained model on a dataset with the given annotations and images.

python test.py --checkpoint output/checkpoint_1.tar --config output/config.yaml
--annotations testset/annotations.json --imagedir testset/images --weighted_prediction

Our pre-trained models

One can download our pre-trained models:

model on OCD dataset HERE
model on UnRel dataset HERE
model on Cut-paste dataset HERE

annotation files for training and testing the models

All annotation files for training our models in three datasets above can be found HERE

All annotation files for testing our models in three datasets above can be found HERE

Datasets

Download all the folders in human from HERE and place them in human in the current repository.

Existing datasets

Existing datasets can be downloaded from UnRel, Cocostuff and Cut-and-Paste.

Our Out-of-Context Dataset (OCD)

Our OCD dataset is developed based on the VirtualHome simulation environment. Download the python github repository HERE and the original unity repository HERE.

(Skip this step) If one wants to build Unity Virtualhome environment from scratch, replace the old virtualhome_unity/Assets/Story Generator/Scripts/TestDriver.cs in the original unity repository with unity/TestDriver.cs in the current repository. Re-export Unity executable files.

If one wants to directly run the Virtualhome environment with our pre-defined out-of-context conditions to generate our OCD dataset, it is NOT necessary to download and install Unity. Directly download the pre-compiled Unity executable file from HERE. It runs on Linux 64-bit platform (such as Ubuntu18.04). Make sure to double click this executable linux/MMVHU.x86_64, and it is running.

MAC OSX version can be downloaded HERE and Windows version can be downloaded HERE.

Copy all the files in unity folder in the current repository to virtualhome/demo/ folder in the downloaded python github repository HERE.

replace virtualhome/simulation/unity_simulator/comm_unity.py in the downloaded python github above with our latest unity/comm_unity.py.

And then, go to cd virtualhome/demo/ folder, launch any of the following Python scripts in the command window:

#generate environment graphs (compulsory before running any of the following conditions)
python exp_HumanGraph.py
python exp_HumanGraph_anomaly.py
#different contextual conditions
#generate images for gravity
python exp_graivty.py
python exp_gravity_ori.py
#Size
python exp_size.py
python exp_size_2.py
python exp_size_ori.py
python exp_size_ori_2.py
#Normal Conditions
python exp_GT.py
python exp_GT_ori.py
#Co-occurence (C) and Gravity + C
python exp_anomaly.py
python exp_anomaly_wall.py
#NoContext
python exp_GT_seg.py
python exp_GT_ori_seg.py
#Training images from VH and test on COCOstuff
python exp_train_5.py
python exp_train_6.py

It would generate image stimulus and save the corresponding 3D object configurations in the path and directory specified in each python script, e.g.:

stimulusdirname = 'stimulus_gravity'
jasondirname = 'jason_gravity'

You can skip all the steps above, if you want to directly use the images from our dataset without any modifications. Download links for the dataset:

Normal: raw images HERE
Gravity raw images HERE
Size raw images HERE
Co-occurrence raw images HERE
G+C raw images (naming convention with _wall) HERE
NoContext raw images (naming convention with _seg) HERE
Training images from VirtualHome raw images HERE and HERE
Jason files HERE For each image in the conditions above, there exists a corresponding jason file storing the target object classname, class-id, apartment-id, room-id, surface-id, the bounding box (left, right, bottom, top coordinate wrt (1024, 1280) image size).

NOTE NOT all images are used for testing. Within each condition, we manually filtered and selected the good quality images for human and model testing. The human/Mat/VHhumanStats_*.mat stores the SELECTED test images. The raw filtered image lists for each dataset is in human/filtered/. For example, filtered_gravity, filtered_gravity_ori and human/Mat/VHhumanStats_gravity.mat are the selected image information for gravity condition.

(not recommended) If one wants to filter images again, use scripts unity/filterImages_gravity.ipynb and human/ProcessFilteredTextFiles_gravity.m to re-generate human/Mat/VHhumanStats_gravity.mat.

Mturk Experiments

We conduct Amazon Mechanical Turk experiments using the selected images from above in the OCD datasets. The following gif illustrates how an individual trial looks. Additional examples for all context conditions can be found in the human/mturk_examples folder.

mturk_example

For all mturk experiments, one can download HERE. We are now using human/expGravity as an example.

We designed a series of Mturk experiments using Psiturk which requires javascripts, HTML and python 2.7. The source codes have been successfully tested on MAC OSX and Ubuntu 18.04. See sections below for installation, running the experiments locally and launching the experiments online.

Installation of Psiturk

Refer to link for Anaconda installation. Alternatively, execute the following command:

curl -O https://repo.anaconda.com/archive/Anaconda3-2019.03-Linux-x86_64.sh
bash Anaconda3-2019.03-Linux-x86_64.sh

After Anaconda installation, create a conda environment:

conda create -n mturkenv python=2.7

Activate the conda environment:

conda activate mturkenv

Install psiturk using pip.

Note Psiturk has upgraded to python3. Please use the following to install psiturk for python2 version (source code on mturk experiments in this repository only works on python2 version):

pip install --upgrade psiturk==2.3.12

Refer to HERE for detailed instruction on setting up psiturk key and paste them in .psiturkconfig.

Running the experiment locally

Navigate to any experiments in human/expGravity folder. In the following, we take gravity as an example, one can replace it with any other experiments. Open a command window, navigate to human/expGravity, and run the experiment in debug mode:

cd human/expGravity
psiturk
server on
debug

NOTE You can run the source codes directly. All the stimulus set (all GIF files) have been hosted in our lab server: http://kreimanlab.com/mengmiMturkHost/VirtualHome/keyframe_VH_gravity_gif/. One can freely view any stimulus (.gif) via Internet, e.g. http://kreimanlab.com/mengmiMturkHost/VirtualHome/keyframe_VH_gravity_gif/gif_712_7_5_3.gif. In case that the links are unavailable, one can generate the whole stimulus set for each experiment by running human/PreprocessVH_gravity.m to generate GIF, running human/GenerateMturkSets_expGravity.m to generate random shuffled sequence of GIF presentation. The pre-generated random shuffled sequence has been stored in human/expGravity/static/ImageSet/ folder.

We now list a detailed description of important source files:

human/db/expGravity.db: a SQL database storing online subjects' response data.
human/expGravity/template/instructions/instruct-1.html: show instructions to the human subjects
human/expGravity/static/js/task.js: main file to load stimulus and run the experiment

It is optional to re-process these .db files. Since all the pre-processed results have been stored in human/Mat/. If one wants to re-convert these .db files to .mat files. For each experiment, one can run human/ProcessDBfile_expGravity.m and mturk/CompileAllExpGravity.m.

To plot results in the paper, run the following scripts:

PlotAblationOverall_humanoverlap.m #ablation plots
PlotBar_unrel.m #bar plots for unrel experiment
PlotCorrelation_table_cvpr.m #for cut-and-paste dataset
PlotModelOverall_humanoverlap.m #models on OCD dataset
PlotHumanOverall_modeloverlap.m #human on OCD dataset

Launching the experiment online using Elastic Cloud Computing (EC2) in Amazon Web Services (AWS)

Copy the downloaded source codes to EC2 server and run the psiturk experiment online. Refer to HERE for detailed instruction.

Citation

When Pigs Fly: Contextual Reasoning in Synthetic and Natural Scenes Manuscript

Authors: Philipp Bomatter*, Mengmi Zhang*, Dimitar Karev, Spandan Madan, Claire Tseng, Gabriel Kreiman (* equal contribution)

Notes

The source code is for illustration purpose only. Path reconfigurations may be needed to run some MATLAB scripts. We do not provide techinical supports but we would be happy to discuss about SCIENCE!

License

See Kreiman lab for license agreements before downloading and using our source codes and datasets.