Home

Awesome

RaDialog-LLaVA

RaDialog-LLaVA is the improved version of the original RaDialog model which can be found on Github and Arxiv. It follows the same concepts, including the same image encoder, chexbert classifier, prompt construction and language model. However, we followed the LLaVA methodolgy for image-text alignment, leading to improved conversational assistance and making the model easier use. The main differences are the following:

RaDialog-LLaVA main results:

<img align="center" src="figs/results-radialog-llava.png" alt="teaser" width="40%">

✨ News ✨


<img align="right" src="figs/example.png" alt="teaser" width="50%" style="margin-left: 20px">

Conversational AI tools that can generate and discuss clinically correct radiology reports for a given medical image have the potential to transform radiology. Such a human-in-the-loop radiology assistant could facilitate a collaborative diagnostic process, thus saving time and improving the quality of reports. Towards this goal, we introduce RaDialog, the first thoroughly evaluated and publicly available large vision-language model for radiology report generation and interactive dialog. RaDialog effectively integrates visual image features and structured pathology findings with a large language model (LLM) while simultaneously adapting it to a specialized domain using parameter-efficient fine-tuning. To keep the conversational abilities of the underlying LLM, we propose a comprehensive, semi-automatically labeled, image-grounded instruct dataset for chest X-ray radiology tasks. By training with this dataset, our method achieves state-of-the-art clinical correctness in report generation and shows impressive abilities in interactive tasks such as correcting reports and answering questions, serving as a foundational step toward clinical dialog systems.

Getting Started with RaDialog-LLaVA

To test RaDialog and use it for inference, follow the instructions in our huggingface repository here.

For more detailed instructions on how to train and evaluate RaDialog, please refer to the instructions below.

Repository Installation

Environment Setup:

1) RaDialog Environment

2) CheXbert Environment (only needed for CheXbert score evaluation)

Prepare Data

1) Download the RaDialog-Instruct dataset

2) Download MIMIC-CXR

3) Create sectioned report data

Evaluate RaDialog on MIMIC-CXR test set:

Train RaDialog:

1) CheXbert classifier Training

3) LLM Training

Demo

Demo Environment

Run Demo:

Reference

When using our model (original and LLaVA version) or dataset, please cite:

@article{pellegrini2023radialog,
  title={RaDialog: A Large Vision-Language Model for Radiology Report Generation and Conversational Assistance},
  author={Pellegrini, Chantal and {\"O}zsoy, Ege and Busam, Benjamin and Navab, Nassir and Keicher, Matthias},
  journal={arXiv preprint arXiv:2311.18681},
  year={2023}
}