Awesome
SwapMix
Implementation of SwapMix approach to measure visual bias for visual question answering(SwapMix: Diagnosing and Regularizing the Over-Reliance on Visual Context in Visual Question Answering, Vipul et al., CVPR22)
Introduction
We provide a new way to benchmark in a VQA model by perturbing the visual context i.e. irrelevant objects in the image.
The model looks at an image and a question. Then we change the visual context (irrelevant objects to the question) in the image. For each question we make multiple copies of image by changing context. Ideally, we would expect the model's prediction to remain consistent with context switch.
This repository contains code for measuring bias using SwapMix and training VQA models using SwapMix as data augmentation as described in the paper. Specifically, we have applied SwapMix to MCAN and LXMERT. We use GQA dataset for our analysis.
Implementation Details
The code has been divided into MCAN and LXMERT folders. Inside each folder we provide implementation for
- Measuring visual bias using SwapMix
- Finetuning models using SwapMix as data augmentation technique
- Training model with perfect sight.
Download Dataset
We restructured the format of question, answer, and scene graph files provided by GQA a bit. You can download these files along with other files needed for SwapMix implementation from here and place it at <code>data/gqa</code> folder.
We recommend to use object features provided by GQA. Download the features from GQA
Download pretrained models
We provide (1) finetuned model (2) model finetuned using SwapMix as data augmentation (3) model trained with perfect sight (4) model trained with perfect sight and using SwapMix as data augmentation technique. Please download the models from here : MCAN trained models, LXMERT trained models
Evaluation
We measure visual bias of the model for both irrelevant object changes and attribute changes seperately.
Before benchmarking visual bias for these models, we finetune them on GQA train dataset for better performance. Models are evaluated on GQA val set.
To measure visual bias for MCAN, download the dependencies and dataset from here and then run :
cd mcan
python3 run_files/run_evaluate.py --CKPT_PATH=<path to ckpt file>
To measure context reliance after calculating object and attribute results :
cd scripts
python benchmark_frcnn.py --obj <SwapMix object json file> --attr <SwapMix attribute json file>
Evaluating new model for visual bias
SwapMix can be used to measure visual bias on any VQA model.
Changes are needed on data loading and testing part. The current code iterates over each question indiviually to get predictions for all SwapMix perturbations.
Details for measuring visual bias on a new model can be found here
Citation
If you like our work and find this code useful, consider citing our work :
@inproceedings{gupta2022swapmix,
title={SwapMix: Diagnosing and Regularizing the Over-Reliance on Visual Context in Visual Question Answering},
author={Gupta, Vipul and Li, Zhuowan and Kortylewski, Adam and Zhang, Chenyu and Li, Yingwei and Yuille, Alan},
booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
year={2022}
}
References
- Deep Modular Co-Attention Networks for Visual Question Answering, Zhou et al., CVPR 2019
- LXMERT: Learning Cross-Modality Encoder Representations from Transformers, Hao et al., EMNLP 2019
- VQA : Visual Question Answering, Antol et al., ICCV15