Home

Awesome

Rad-ReStruct: A Novel VQA Benchmark and Method for Structured Radiology Reporting

Paper:

This is the official repository for the paper "Rad-ReStruct: A Novel VQA Benchmark and Method for Structured Radiology Reporting", which published at MICCAI 2023. This repo includes our code as well as our structured reporting dataset.

Dataset Overview:

<img src="figures/structure_overview.png" alt="pipeline" width="100%"/>

Model Architecture:

<img src="figures/model_overview.png" alt="pipeline" width="100%"/>

We introduce Rad-ReStruct, a new benchmark dataset that provides fine-grained, hierarchically ordered annotations in the form of structured reports for X-Ray images. Our dataset provides structured labels for the images in the IU-XRay dataset available on OpenI [1]. We model the structured reporting task as hierarchical visual question answering (VQA) and propose hi-VQA, a novel method that considers prior context in the form of previously asked questions and answers for populating a structured radiology report. In this repo we provide the structured reports as well as the list of questions to fill this report for every image. Authors: Chantal Pellegrini, Matthias Keicher, Ege Özsoy, Nassir Navab

[1] Demner-Fushman, D., Antani, S., Simpson, M., & Thoma, G. R. (2012). Design and development of a multimodal biomedical information retrieval system. Journal of Computing Science and Engineering, 6(2), 168-177.

Updated Results:

Since the publication of our paper, we have updated our dataset to fix some inconsistencies in the structured reports. The updated evaluation results on Rad-ReStruct are shown below. Please use these updated results when comparing to our method.

<img src="figures/main_results.png" alt="pipeline" width="45%"/> <img src="figures/detailed_results.png" alt="pipeline" width="72%"/>

Setup:

  1. Clone this repository

    git clone https://github.com/ChantalMP/Rad-ReStruct.git
    
  2. Install requirements:

    conda create -n radrestruct_env python=3.9
    conda activate radrestruct_env
    pip install -r requirements.txt
    
  3. Download data

    Rad-ReStruct:

    VQARad:

    • download images as zip from https://osf.io/89kps/
    • unpack and copy image files into 'data/vqarad/images'

Use the Rad-ReStruct dataset:

Training with VQA Dataset:

  from data_utils.data_radrestruct import RadReStruct
  
  # required parameters
  parser = argparse.ArgumentParser(description="Finetune on RadReStruct")

  parser.add_argument('--bert_model', type=str, required=False, default="zzxslp/RadBERT-RoBERTa-4m", help="pretrained question encoder weights")
  parser.add_argument('--max_position_embeddings', type=int, required=False, default=458, help="max length of sequence")
  parser.add_argument('--img_feat_size', type=int, required=False, default=14, help="dimension of last pooling layer of img encoder")
  parser.add_argument('--num_question_tokens', type=int, required=False, default=20, help="number of tokens for question")

  args = parser.parse_args()
  args.num_image_tokens = args.img_feat_size ** 2
  args.num_question_tokens = 458 - 3 - args.num_image_tokens
  
  # load dataset
  train_tfm, val_tfm = ... # image transforms, our transforms can be found in 'train_radrestruct.py' (line 100)
  traindataset = RadReStruct(tfm=train_tfm, mode='train', args=args)
  valdataset = RadReStruct(tfm=test_tfm, mode='val', args=args)

Report-based Evaluation:

  from data_utils.data_radrestruct import RadReStructEval
  
  # required parameters
  ...
  
  # load dataset
  val_tfm = ... # image transforms, our transforms can be found in 'evaluate_radrestruct.py' (line 100)
  valdataset = RadReStructEval(tfm=test_tfm, mode='test', args=args, limit_data=2)
    from evaluation.evaluator_radrestruct import AutoregressiveEvaluator
    
    preds = ...
    evaluator = AutoregressiveEvaluator()
    targets = get_targets_for_split('test')
    acc, acc_report, f1, precision, recall, detailed_metrics = evaluator.evaluate(preds, targets)

Reproduce our results:

Rad-ReStruct:

training: python -m train.train_radrestruct --run_name "train_radrestruct" --classifier_dropout 0.2 --acc_grad_batches 2 --lr 1e-5 --epochs 200 --batch_size 32 --progressive --mixed_precision 
evaluation: python -m evaluation.evaluate_radrestruct --run_name "eval_radrestruct" --match_instances --batch_size 1 --use_pretrained --model_dir "<WEIGHT_PATH>" --progressive --mixed_precision 

To use our pretrained model, download the weights from release v.1.0.0 and set WEIGHT_PATH pointing to the weight file.

VQARad:

training: python -m train.train_vqarad --run_name "train_vqarad" --acc_grad_batches 2 --lr 5e-5 --epochs 200 --batch_size 32 --progressive --mixed_precision 
evaluation: python -m evaluation.evaluate_autoregressive_vqarad --run_name "eval_vqarad" --batch_size 1 --use_pretrained --model_dir "<WEIGHT_PATH>" --progressive --mixed_precision 

To use our pretrained model, download the weights from release v.1.0.0 and set WEIGHT_PATH pointing to the weight file.

Intended Use

This model is intended to be used solely for (I) future research on visual-language processing and (II) reproducibility of the experimental results reported in the reference paper.

Reference

When using our dataset please cite both our paper and the original IU-XRay paper:

@article{pellegrini2023radrestruct,
  title={Rad-ReStruct: A Novel VQA Benchmark and Method for Structured Radiology Reporting},
  author={Pellegrini, Chantal and Keicher, Matthias and {\"O}zsoy, Ege and Navab, Nassir},
  journal={TODO},
  year={2023}
}
@article{demner2012design,
  title={Design and development of a multimodal biomedical information retrieval system},
  author={Demner-Fushman, Dina and Antani, Sameer and Simpson, Matthew and Thoma, George R},
  journal={Journal of Computing Science and Engineering},
  volume={6},
  number={2},
  pages={168--177},
  year={2012},
  publisher={Demner-Fushman Dina; Antani Sameer; Simpson Matthew; Thoma George R.}
}