Home

Awesome

Introduction: X-Ray Report Generation

This repository is for our EMNLP 2021 paper "Automated Generation of Accurate & Fluent Medical X-ray Reports". Our work adopts x-ray (also including some history data for patients if there are any) as input, a CNN is used to learn the embedding features for x-ray, as a result, <B>disease-state-style information</B> (Previously, almost all work used detected disease embedding for input of text generation network which could possibly exclude the false negative diseases) is extracted and fed into the text generation network (transformer). To make sure the <B>consistency</B> of detected diseases and generated x-ray reports, we also create a <B>interpreter</B> to enforce the accuracy of the x-ray reports. For details, please refer to here.

<p align="center"> <img src="https://github.com/ginobilinie/xray_report_generation/blob/main/img/motivation.png" width="400" height="400"> </p>

Data we used for experiments

We use two datasets for experiments to validate our method:

Performance on two datasets

DatasetsMethodsBLEU-1BLEU-2BLEU-3BLEU-4METEORROUGE-L
Open-ISingle-view0.4630.3100.2150.1510.1860.377
Multi-view0.4760.3240.2280.1640.1920.379
Multi-view w/ Clinical History0.4850.3550.2730.2170.2050.422
Full Model (w/ Interpreter)0.5150.3780.2930.2350.2190.436
MIMICSingle-view0.4470.2900.2000.1440.1860.317
Multi-view0.4510.2920.2010.1440.1850.320
Multi-view w/ Clinical History0.4910.3570.2760.2230.2130.389
Full Model (w/ Interpreter)0.4950.3600.2780.2240.2220.390

Environments for running codes

How to use our code for train/test

<B>Step 0:</B> Build your vocabulary model with SentencePiece (tools/vocab_builder.py)

<B>Step 1:</B> Train the LSTM and/or Transformer models, which are just text classifiers, to obtain 14 common disease labels.

  1. Evaluate the performance of the generated reports by comparing the predicted labels and the ground-truth labels.
  2. Use the trained models to fine-tune medical reports' output.

<B>Step 2:</B> Test the text classifier models using the train_text.py with:

<B>Step 3:</B> Transfer the trained model to obtain 14 common disease labels for the Open-I datasets and any dataset that doesn't have ground-truth labels.

<B>Step 4:</B> Get additional labels using (tools/count_nounphrases.py)

<B>Step 5:</B> Train the ClsGen model (Classifier-Generator) with train_full.py

<B>Step 6:</B> Train the ClsGenInt model (Classifier-Generator-Interpreter) with train_full.py

<B>Step 7:</B> Generate the outputs

<B>Step 8:</B> Evaluate the generated reports.

Our pretrained models

Our model is uploaded in google drive, please download the model from

Model NameDownload Link (Goog)Download Link (Baidu)
Our Model for MIMICGoogle DriveBaidu Yunpan
Our Model for NLMCXRGoogle DriveBaidu Yunpan

Citation

If it is helpful to you, please cite our work:

@inproceedings{nguyen-etal-2021-automated,
    title = "Automated Generation of Accurate {\&} Fluent Medical {X}-ray Reports",
    author = "Nguyen, Hoang  and
      Nie, Dong  and
      Badamdorj, Taivanbat  and
      Liu, Yujie  and
      Zhu, Yingying  and
      Truong, Jason  and
      Cheng, Li",
    booktitle = "Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing",
    month = nov,
    year = "2021",
    address = "Online and Punta Cana, Dominican Republic",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2021.emnlp-main.288",
    doi = "10.18653/v1/2021.emnlp-main.288",
    pages = "3552--3569",
}