Home

Awesome

GALS: Guiding Visual Attention with Language Specification

GALS

This is the official implementation for the CVPR 2022 paper On Guiding Visual Attention with Language Specification by Suzanne Petryk*, Lisa Dunlap*, Keyan Nasseri, Joseph Gonzalez, Trevor Darrell, and Anna Rohrbach.

If you find our code or paper useful, please cite:

@article{petryk2022gals,
  title={On Guiding Visual Attention with Language Specification},
  author={Petryk, Suzanne  and Dunlap, Lisa and Nasseri, Keyan and Gonzalez, Joseph and Darrell, Trevor and Rohrbach, Anna},
  journal={Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2022}
}

Setting up Environment

Conda

conda env create -f env.yaml
conda activate gals

Pip

pip install -r requirements.txt

Download Datasets

Please see the original dataset pages for further detail:

The data is expected to be under the folder ./data. More specifically, here is the suggested data file structure:

Repo Organization

This repo also expects the following additional folders:

We use Weights & Biases to log experiments. This requires the user to be logged in to a (free) W&B account. Details to set up an account here.

Training & Evaluation

Training models using GALS is a 2 stage process:

  1. Generate and store attention per image
  2. Train model using attention

Example commands training networks with GALS as well as the baselines within the paper are below.

NOTE: To change .yaml configuration values on the command line, add text of the form ATTRIBUTE.NESTED=new_value to the end of the command. For example:

CUDA_VISIBLE_DEVICES=0 python main.py --config configs/waterbirds_100_gals.yaml DATA.BATCH_SIZE=96

Stage 1: Generate Attention

Important files:

Sample command:

CUDA_VISIBLE_DEVICES=0 python extract_attention.py --config configs/coco_attention.yaml

Stage 2: Train model

Important files:

The model configs include the hyperparameters and attention settings used to reproduce results in our paper.

An example command to train a model with GALS on Waterbirds-100%:

CUDA_VISIBLE_DEVICES=0,1,2 python main.py --name waterbirds100_gals --config configs/waterbirds_100_gals.yaml

The --name flag is used for Weights & Biases logging. You can add --dryrun to the command to run locally without uploading to the W&B server. This can be useful for debugging.

Model evaluation

To evaluate a model on the test split for a given dataset, simply use the --test_checkpoint flag and provide a path to a trained checkpoint. For example, to evaluate a Waterbirds-95% GALS model with weights under a trained_weights directory

CUDA_VISIBLE_DEVICES=0 python main.py --config configs/waterbirds_95_gals.yaml --test_checkpoint trained_weights/waterbirds_95_gals.ckpt

Note: For MSCOCO-ApparentGender, the Ratio Delta in our paper is 1-test_ratio in the output results.

Checkpoints/Results

In our paper, we report the mean and standard deviation over 10 trials. Below, we include a checkpoint from a single trial per experiment.

Waterbirds 100%

MethodPer Group Acc (%)Worst Group Acc (%)
GALS80.6757.00
Vanilla72.3632.20
UpWeight72.2237.29
ABN71.9644.39

Waterbirds 95%

MethodPer Group Acc (%)Worst Group Acc (%)
GALS89.0379.91
Vanilla86.9173.21
UpWeight87.5176.48
ABN86.8569.31

Red Meat (Food101)

MethodAcc (%)Worst Group Acc (%)
GALS72.2458.00
Vanilla69.2048.80
ABN69.2852.80

MSCOCO-ApparentGender

MethodRatio DeltaOutcome Divergence
GALS0.1600.022
Vanilla0.3490.071
UpWeight0.2720.040
ABN0.3340.068

Acknowledgements

We are very grateful to the following people, from which we have used code throughout this repository that is taken or based off of their work: