Home

Awesome

Transformers-VQA

An implementation of down-streaming trending pre-trained V+L models to VQA tasks.

Now support: VisualBERT, LXMERT, and UNITER on Linux and Google Colab.

Notes:

@inproceedings{li2020comparison,
  title={A comparison of pre-trained vision-and-language models for multimodal representation learning across medical images and reports},
  author={Li, Yikuan and Wang, Hanyin and Luo, Yuan},
  booktitle={2020 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)},
  pages={1999--2004},
  year={2020},
  organization={IEEE}
}

USAGE

We provide an interactive example of fine-tuning your customized dataset using Google Colab.

Colab Notebook

Following is another example of fine-tuning VQA 2.0 dataset on a linux server.

0. Clone our repo.

1. Install all python dependencies (a virtual environment is highly recommended):

pip install -r requirements.txt

2. Download pre-trained models and place them to data/pretrained/

You can download these models from their own github repo. We also provide command lines to handle this:

VisualBERT:

wget --load-cookies /tmp/cookies.txt "https://docs.google.com/uc?export=download&confirm=$(wget --quiet --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate 'https://docs.google.com/uc?export=download&id=1kuPr187zWxSJbtCbVW87XzInXltM-i9Y' -O- | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/\1\n/p')&id=1kuPr187zWxSJbtCbVW87XzInXltM-i9Y" -O models/pretrained/visualbert.th && rm -rf /tmp/cookies.txt

UNITER:

wget https://convaisharables.blob.core.windows.net/uniter/pretrained/uniter-base.pt -P models/pretrained/

LXMERT:

wget https://nlp.cs.unc.edu/data/model_LXRT.pth -P models/pretrained/

3 Download re-distributed json files for VQA 2.0 (copy from airsplay/lxmert)

wget https://nlp.cs.unc.edu/data/lxmert_data/vqa/train.json -P data/
wget https://nlp.cs.unc.edu/data/lxmert_data/vqa/nominival.json -P  data/
wget https://nlp.cs.unc.edu/data/lxmert_data/vqa/minival.json -P data/
wget https://nlp.cs.unc.edu/data/lxmert_data/vqa/test.json -P data/

4 Download faster-rcnn features for MS COCO train2014 (17 GB) and val2014 (8 GB) images (copy from airsplay/lxmert), this process will take a while

wget https://nlp.cs.unc.edu/data/lxmert_data/mscoco_imgfeat/train2014_obj36.zip -P data/img
unzip data/img/train2014_obj36.zip -d data/img && rm data/img/train2014_obj36.zip
wget https://nlp.cs.unc.edu/data/lxmert_data/mscoco_imgfeat/val2014_obj36.zip -P data/img
unzip data/img/val2014_obj36.zip -d data && rm data/img/val2014_obj36.zip
wget https://nlp.cs.unc.edu/data/lxmert_data/mscoco_imgfeat/test2015_obj36.zip -P data/img
unzip data/img/test2015_obj36.zip -d data && rm data/img/test2015_obj36.zip

5 Now you have fulfill all requirements and dependencies, run this command before fine-tuning on the entire training dataset:

python vqa.py --tiny

6 If no error pops up, you are good to go. Please refer param.py for all settings. Here is an example of fine-tuning UNITER:

python vqa.py --model uniter --epochs 6 --max_seq_length 20 --load_pretrained models/pretrained/uniter-base.pt --output models/trained/

7.a Local validation with BEST config in step 6:

python vqa.py --test minival --load_trained models/trained/BEST

7.b Inference on VQA test split, results will be saved in models/trained/test_predict.json

python vqa.py --test test --load_trained models/trained/BEST