Home

Awesome

REVISE: REvealing VIsual biaSEs

A tool that automatically detects possible forms of bias in a visual dataset along the axes of object-based, attribute-based, and geography-based patterns, and from which next steps for mitigation are suggested.

Demo Video

In the sample_summary_pdfs folder there are examples of the kinds of auto-generated summaries our tool outputs along each axis for a dataset. These samples are annotated in orange with some notes on how to interpret them.

Table of Contents

Setup:

conda env create -f environments/[environment].yml
bash download.sh

Steps to perform analysis:

Note that all scripts are expected to be run from the home directory.

(0.5 optional) To experiment with the tool on the COCO dataset for Object-Based and Attribute-Based metrics (using gender annotations) without having to run all the measurements on a dataset first, follow these steps and then skip to Step 3:

(1) Make a dataloader structured like the 'Template Dataset' in datasets.py (add to main_measure.py as well), and fill in with the dataset you would like to analyze. Test that you have properly implemented a dataset by running:

python3 tester_script.py NewDataset

(2) Run main_measure to make a pass through the data and collect the metrics for analysis, for example to get measurements (details in section below) att_siz, att_cnt, att_dis, att_clu, obj_scn, att_scn on COCO and have the file be saved in coco_example:

python3 main_measure.py --measurements 'att_siz' 'att_cnt' 'att_dis' 'att_clu' 'obj_scn' 'att_scn' --dataset 'coco' --folder 'coco_example'

(2.5 optional) To optionally do some of the processing ahead of time so interacting with the notebook can be faster, for the Attribute notebook (att_clu) run

python3 measurements/prerun_analyzeattr.py --dataset 'coco' --folder 'coco_example'

and for the Geography notebook (geo_tag and geo_lng) run

python3 measurements/prerun_analyzegeo.py --dataset 'yfcc' --folder 'yfcc_example'

(3) Still in the home directory, open the jupyter notebook from within the analysis_notebooks folder corresponding to the axis of bias you would like to explore: object, attribute, or geography. Further instructions are at the top of the notebook about how to run them.

Measurements

Measurements that can be run, along with the file and name of the function they are associated with:

Object-Based

(Note: obj_cnt, obj_siz, obj_ppl actually all run the same function, so for main_measure.py it's only necessary to run one of these to get all the measurements)

obj_cnt: Counts the number of times each instance occurs, coocurrence of instances occurs, and supercateogry occurs.

obj_siz: Counts the size and distance from center at the supercategory level.

obj_ppl: Counts how much supercategories are represented with or without people.

obj_scn: Counts overall scenes, scene-supercategory cooccurrences, scene-instance cooccurrences, and gets features per scene per supercategory.

Attribute-Based

att_siz: Gets the size of the person and distance from center, as well as if a face is detected. Performs pairwise comparisons to find the largest/furthest person instances.

att_cnt: Counts how often each attribute occurs with an instance and instance pair. Performs pairwise comparisons to test significance of count differences.

att_dis: Calculates the distance each attribute is from each object. Runs OvR (One-vs-Rest) analysis to find the attribute that is furthest/closest from an object.

att_clu: Gets scene-level and cropped object-level features per object class for each attribute. Runs OvR analysis to find the most linearly seperable attribute.

att_scn: Counts the types of scenes each attribute occurs with.

(Note: To analyze an attribute along an ordinal axis, define boolean "self.ordinal" and array "self.axis" in the dataset class)

Geography-Based

Note: Geography-Based analyses require a mapping from images to location. The 2 formats of geography annotations supported are (ie. String formatted locations like 'Manhattan'), and GPS labels (latitude and longitude coordinate pairs). Namely, the user should specify in their dataset class the geography_info_type to be one of the following:

geo_ctr: Counts the number of images from each region

geo_tag: Counts the number of tags from each region, as well as extracts AlexNet features pretrained on ImageNet for each tag, grouping by subregion

geo_lng: Counts the languages that make up the image tags, and whether or not they are local to the country the image is from. Also extracts image-level features to compare if locals and tourist portray a country differently

Potential Environment Issues

import os
os.environ['PROJ_LIB'] = '/new/folder/location/of/epsg'

If the epsg file is still not found, it can be downloaded manually from here, with the path locaation set as mentioned.

conda config --set allow_conda_downgrades true
conda install conda=4.6.14

Glossary

Paper and Citation

REVISE: A Tool for Measuring and Mitigating Bias in Visual Datasets. If you find this useful, please cite one or both.

Original ECCV 2020 publication

@inproceedings{revisetool_eccv,
author = {Angelina Wang and Arvind Narayanan and Olga Russakovsky},
title = {{REVISE}: A Tool for Measuring and Mitigating Bias in Visual Datasets},
year = {2020},
booktitle = {European Conference on Computer Vision (ECCV)},
}

Extended IJCV 2022 publication

@article{revisetool_extended,
author = {Angelina Wang and Alexander Liu and Ryan Zhang and Anat Kleiman and Leslie Kim and Dora Zhao and Iroha Shirai and Arvind Narayanan and Olga Russakovsky},
title = {{REVISE}: A Tool for Measuring and Mitigating Bias in Visual Datasets},
year = {2022},
journal = {International Journal of Computer Vision (IJCV)},
}

Funding

This work is partially supported by the National Science Foundation under Grant No. 1763642 and No. 1704444.