Home

Awesome

MMBERT

MMBERT: Multimodal BERT Pretraining for Improved Medical VQA

Yash Khare*, Viraj Bagal*, Minesh Mathew, Adithi Devi, U Deva Priyakumar, CV Jawahar

alt text

Abstract: Images in the medical domain are fundamentally different from the general domain images. Consequently, it is infeasible to directly employ general domain Visual Question Answering ( VQA ) models for the medical domain. Additionally,medical images annotation is a costly and time-consuming process. To overcome these limitations, we propose a solution inspired by self-supervised pretraining of Transformer-style architectures for NLP , V ision and L anguage tasks. Our method involves learning richer medical image and text semantic representations using Masked Language Modeling (MLM) with image features as the pretext task on a large medical image+caption dataset. The proposed solution achieves new state-of-the-art performance on two VQA datasets for radiology images – VQA - M ed 2019 and VQA - RAD , outperforming even the ensemble models of previous best solutions. Moreover, our solution provides attention maps which help in model interpretability.

Train on VQARAD

python train_vqarad.py --run_name give_name --mixed_precision --use_pretrained --lr set_lr  --epochs set_epochs

Train on VQA-Med 2019

python train.py --run_name  give_name --mixed_precision --lr set_lr --category cat_name --batch_size 16 --num_vis set_visual_feats --hidden_size hidden_dim_size

Evaluate

python eval.py --run_name give_name --mixed_precision --category cat_name --hidden_size hidden_dim_size --use_pretrained

VQARAD Results

MMBERT General, which is a single model for both the question types in the dataset, outperforms the existing approaches including the ones which have a dedicated model for each question type.

MethodDedicated ModelsOpen Acc.Closed Acc.Overall Acc.
MEVF + SAN-40.774.160.8
MEVF + BAN-43.975.162.7
Conditional Reasoning:heavy_check_mark:60.079.371.6
MMBERT General:x:63.177.972.0

VQA-Med 2019 Results

Our MMBERT Exclusive achieves state-of-the-art results on the overall accuracy and BLEU score, even surpassing CGMVQA E ns. which is an ensemble of 3 dedicated models for each category. Even our MMBERT General performs better than the CGMVQA Ens. on the abnormality and yes/no categories. Additionally, our MMBERT General outperforms single dedicated CGMVQA models in all the categories but modality.

MethodDedicated ModelsModality Acc.Modality BleuPlane Acc.Plane BleuOrgan Acc.Organ BleuAbnormality Acc.Abnormality BleuYes/No Acc.Yes/No BleuOverall Acc.Overall Bleu
VGG16 + BERT-----------62.464.4
CGMVQA:heavy_check_mark:80.585.680.881.372.876.91.71.775.075.062.464.4
CGMVQA Ens.:heavy_check_mark:81.988.086.486.478.479.74.47.678.178.164.065.9
MMBERT General:x:77.781.882.482.973.676.65.26.785.985.962.464.2
MMBERT NP:heavy_check_mark:80.685.681.682.171.274.44.35.778.178.160.262.7
MMBERT Exclusive:heavy_check_mark:83.386.286.486.476.880.714.016.087.587.567.269.0