Awesome

Learnable Adaptive Margin Loss To Overcome Language Bias in Visual Question Answering

This repository contains the implementation of our model AdaArc-LM. This repository is built upon https://github.com/guoyang9/AdaVQA.

Almost all flags can be set at utils/config.py. The dataset paths, the hyperparams can be set accordingly in this file.

GPU used:

* One NVIDIA GeForce RTX 2080 Tis

Memory required:

* 4GB approximately

Prerequisites

* python==3.7.11
* nltk==3.7
* bcolz==1.2.1
* tqdm==4.62.3
* numpy==1.21.4  
* pytorch==1.10.2
* tensorboardX==2.4
* torchvision==0.11.3
* h5py==3.5.0

Dataset

Download the VQA-CP datasets from the link provided in the supplementary material.
The image features can be downloaded by following instructions from : https://github.com/hengyuan-hu/bottom-up-attention-vqa.
The pre-trained Glove features can be accessed via https://nlp.stanford.edu/projects/glove/.

After downloading the datasets, keep them in the folders set by config.py

Preprocessing

The preprocessing steps are as follows:

process questions and dump dictionary:
```
python tools/create_dictionary.py
```
process answers and question types, and generate the frequency-based margins:
```
python tools/compute_softscore.py
```

convert image features to h5:

python tools/detection_features_converter.py

Model training instruction

    python main_arcface.py --name test-VQA --gpu 0

Model evaluation instruction

    python main_arcface.py --name test-VQA --eval-only

Running this code creates a new json file (eg. abc.json), which contains test question ids and the answers predicted by the model.

Category wise evaluation instruction

python acc_per_type.py abc.json

The argument name refers to the name of the file in which the model weights will be finally stored.

Results on AdaArc and AdaArc-LM evaluated on VQA-CP v2

Model	Accuracy in %
AdaArc	57.24
+ Randomization	57.97
+Bias-injection	59.44
+Learnable margins	59.87
+Supervised Conctrastive Loss	60.41