Awesome

`Chexpert++`

Description

Source implementation and pointer to pre-trained models for chexpert++ (arxiv link forthcoming) a BERT-based approximation to CheXpert for radiology report labeling. Note that a compelling, co-discovered alternative is [1], which features a more full-fledged annotation effort featuring two board-certified radiologists and a more robust error resolution system. This paper is accessible here.

Obtaining our Pre-trained Model

Our Pre-trained BERT model is soon to be available via PhysioNet. In the meantime, it is accessible on google cloud platform (GCP) to users who are credentialed for accessing the MIMIC-CXR GCP bucket via PhysioNet. Our bucket link and instructions to gain access through PhysioNet are included below, and please email mmd@mit.edu if you have any questions.

Our Bucket

https://console.cloud.google.com/storage/browser/chexpertplusplus

Instructions for getting physionet MIMIC-CXR GCP Access

First, follow the physionet instructions to add google cloud access, here: https://mimic.physionet.org/gettingstarted/cloud/Next,
Next, get access to MIMIC-CXR in general on Physionet: https://physionet.org/content/mimic-cxr/2.0.0/ (go to the bottom of the page and follow the steps listed under "Files", including becoming a credentialed user and signing the data use agreement)
Finally, request access to MIMIC-CXR via GCP on Physionet: https://physionet.org/projects/mimic-cxr/2.0.0/request_access/3

Installation

To install a conda environment suitable for reproducing this work, use the environment spec available in env.yml, via, e.g.

conda env create -f env.yml -n [ENVIRONMENT NAME]

Additionally, you must download the MIMIC-CXR dataset and split the reports into sentences, then label each of these with the CheXpert labeler (code/splits not provided). You must also download the Clinical BERT model, available here.

Usage Instructions

Main model source code is available in ./chexpert_approximator. Model training, evaluation, and active learning proof-of-concept are all available in Jupyter Notebooks/.

Citation

This Work: Matthew B.A. McDermott, Tzu Ming Harry Hsu, Wei-Hung Weng, Marzyeh Ghassemi, and Peter Szolovits. "Chexpert++: Approximating the CheXpert labeler for Speed, Differentiability, and Probabilistic Output." Machine Learning for Health Care (2020) (in press; link TBA).

[1] Akshay Smit, Saahil Jain, Pranav Rajpurkar, Anuj Pareek, Andrew Y. Ng, and Matthew P. Lungren. "CheXbert: Combining Automatic Labelers and Expert Annotations for Accurate Radiology Report Labeling Using BERT." arXiv preprint arXiv:2004.09167 (2020). https://arxiv.org/pdf/2004.09167.pdf