Home

Awesome

A Question-Centric Model for Visual Question Answering in Medical Imaging

This repo was made by Minh H. Vu based on an amazing work for MUTAN. We developed this code in the frame of a research paper called A Question-Centric Model for Visual Question Answering in Medical Imaging which is the current state-of-the-art on the medical images.

The goal of this repo is two folds:

If you have any questions about our code or model, don't hesitate to contact us or to submit any issues. Pull request are welcome!

News:

Summary:

Introduction

What is the task about?

The task is about training models in a end-to-end fashion on a multimodal dataset made of triplets:

As you can see in the illustration bellow, two different triplets (but same image) of the VQA dataset are represented. The models need to learn rich multimodal representations to be able to give the right answers.

<p align="center"> <img src="https://raw.githubusercontent.com/vuhoangminh/vqa_medical/master/images/examples.PNG"/> </p>

The VQA task is still on active research. However, when it will be solved, it could be very useful to improve human-to-machine interfaces (especially for the blinds).

Quick insight about our method

The VQA community developped an approach based on four learnable components:

<p align="center"> <img src="https://raw.githubusercontent.com/vuhoangminh/vqa_medical/master/images/TMI-VQA-19-comparison.png"/> </p> <p align="center"> <img src="https://raw.githubusercontent.com/vuhoangminh/vqa_medical/master/images/grad-cam-natural.PNG"/> </p> <p align="center"> <img src="https://raw.githubusercontent.com/vuhoangminh/vqa_medical/master/images/grad-cam.PNG"/> </p> <p align="center"> <img src="https://raw.githubusercontent.com/vuhoangminh/vqa_medical/master/images/MICCAI-VQA-19-new.png"/> </p> <p align="center"> <img src="https://raw.githubusercontent.com/vuhoangminh/vqa_medical/master/images/post-hoc-test.PNG"/> </p> <p align="center"> <img src="https://raw.githubusercontent.com/vuhoangminh/vqa_medical/master/images/qa-list.PNG"/> </p> <p align="center"> <img src="https://raw.githubusercontent.com/vuhoangminh/vqa_medical/master/images/result-acc.PNG"/> </p> <p align="center"> <img src="https://raw.githubusercontent.com/vuhoangminh/vqa_medical/master/images/result-precision-macro.PNG"/> </p>

One of our claim is that the multimodal fusion between the image and the question representations is a critical component. Thus, our proposed model uses a Tucker Decomposition of the correlation Tensor to model richer multimodal interactions in order to provide proper answers. Our best model is based on :

Installation

Requirements

First install python 3 (we don't provide support for python 2). We advise you to install python 3 and pytorch with Anaconda:


conda create --name vqa python=3

source activate vqa

conda install pytorch torchvision cuda80 -c soumith

Then clone the repo (with the --recursive flag for submodules) and install the complementary requirements:


cd $HOME

git clone --recursive https://github.com/Cadene/vqa.pytorch.git

cd vqa.pytorch

pip install -r requirements.txt

Submodules

Our code has two external dependencies:

Data

Data will be automaticaly downloaded and preprocessed when needed. Links to data are stored in vqa/datasets/vqa.py, vqa/datasets/coco.py and vqa/datasets/vgenome.py.

Documentation

Architecture


.

├── options # default options dir containing yaml files

├── logs # experiments dir containing directories of logs (one by experiment)

├── data # datasets directories

| ├── coco # images and features

| ├── vqa # raw, interim and processed data

| ├── vgenome # raw, interim, processed data + images and features

| └── ...

├── vqa # vqa package dir

| ├── datasets # datasets classes & functions dir (vqa, coco, vgenome, images, features, etc.)

| ├── external # submodules dir (VQA, skip-thoughts.torch, pretrained-models.pytorch)

| ├── lib # misc classes & func dir (engine, logger, dataloader, etc.)

| └── models # models classes & func dir (att, fusion, notatt, seq2vec, convnets)

|

├── train.py # train & eval models

├── eval_res.py # eval results files with OpenEnded metric

├── extract.py # extract features from coco with CNNs

└── visu.py # visualize logs and monitor training

Options

There are three kind of options:

You can easly add new options in your custom yaml file if needed. Also, if you want to grid search a parameter, you can add an ArgumentParser option and modify the dictionnary in train.py:L80.

Datasets

We currently provide four datasets:

Citation

Please cite this paper if you use our work:

@ARTICLE{9024133,
  author={M. H. {Vu} and T. {Löfstedt} and T. {Nyholm} and R. {Sznitman}},
  journal={IEEE Transactions on Medical Imaging}, 
  title={{A Question-Centric Model for Visual Question Answering in Medical Imaging}}, 
  year={2020},
  volume={39},
  number={9},
  pages={2856-2868},
  doi={10.1109/TMI.2020.2978284}}

Acknowledgment

This research was conducted using the resources of the High Performance Computing Center North (HPC2N) at Umeå University, Umeå, Sweden. We are grateful for the financial support obtained from the Cancer Research Fund in Northern Sweden, Karin and Krister Olsson, Umeå University, The Västerbotten regional county, and Vinnova, the Swedish innovation agency.