Home

Awesome

Clinical Text Summarization by Adapting LLMs | Nature Medicine

Official implementation from Stanford University<br>

<img src='data/overview.png'/>

Datasets

We use six pre-existing open-source datasets which are publicly accessible at the sources cited in our manuscript. Additionally, for datasets which do not require PhysioNet access, we provide our versions in data/:

Models

In addition to proprietary models GPT-3.5 and GPT-4, we adapt the following open-source models available from HuggingFace:

Code

Set-up

  1. Use these commands to set up a conda environment:
conda env create -f env.yml
conda activate clin-summ 
  1. In src/constants.py, create your own project directory DIR_PROJECT outside this repository which will contain input data, trained models, and generated output.
  2. Move input data from this repo to DIR_PROJECT, i.e. mv data/ DIR_PROJECT
  3. (optional) To add your own dataset, follow the format of example datasets opi, chq, and d2n in DIR_PROJECT/data/

Usage

Below is a description of relevant scripts:

Citation

@article{vanveen2024clinical,
  title={Adapted Large Language Models Can Outperform Medical Experts in Clinical Text Summarization},
  author={Van Veen, Dave and Van Uden, Cara and Blankemeier, Louis and Delbrouck, Jean-Benoit and Aali, Asad and Bluethgen, Christian and Pareek, Anuj and Polacin, Malgorzata and Collins, William and Ahuja, Neera and Langlotz, Curtis P. and Hom, Jason and Gatidis, Sergios and Pauly, John and Chaudhari, Akshay S.},
  journal={Nature Medicine},
  year={2024},
  doi={10.1038/s41591-024-02855-5},
  url={https://doi.org/10.1038/s41591-024-02855-5},
  published={27 February 2024}
}

License