Home

Awesome

NormPULSE: A Generative Approach for Clinical Term Normalization

<!-- **Here are some ideas to get you started:** šŸ™‹ā€ā™€ļø A short introduction - what is your organization all about? šŸŒˆ Contribution guidelines - how can the community get involved? šŸ‘©ā€šŸ’» Useful resources - where can the community find your docs? Is there anything else the community should know? šŸæ Fun facts - what does your team eat for breakfast? šŸ§™ Remember, you can do mighty things with the power of [Markdown](https://docs.github.com/github/writing-on-github/getting-started-with-writing-and-formatting-on-github/basic-writing-and-formatting-syntax) -->

This repository is a sub-repository of PULSE. image


Key Features

This repository provides the official implementation of NormPULSE.

Key feature bulletin points here

<!-- give an introduction of your project -->

Details

We outline the comprehensive framework of our solution to clinical term normalization, NormPULSE, which is based on PULSE and comprises three steps:

  1. Training, There are three tasks in the training step, knowledge card generation, aiming at enhancing the knowledge inside term by distilling knowledge from LLM; hierarchical tree construction based on the ICD codes and term normalization, making the model get the ability to select the standard terms from a certain candidate list.
  2. knowledge-enhanced retrieval, the model retrieves candidates for the given mention using the generated knowledge cards and locates each candidate's path in the constructed hierarchical tree to build a subtree.
  3. hierarchical reasoning, the model reasons out the final result layer by layer through the subtree.
<!-- Insert a pipeline of your algorithm here if got one -->

image

Dataset

The part of clinical term normalization data is based on the following two open-source datasets.

The standard terminology database is ICD-10医äæ2.0ē‰ˆ and ICD-9-CM3医äæ2.0ē‰ˆ, and we construct the two corresponding code trees by parsing the term codes, which are available at ICD-10_医äæv2_tree.json and ICD-9-CM3_医äæv2_tree.json

We also provide the examples of the training data at the data directory.

Get Started

Model Setup

Main Requirements

cuda, no more than 12.x. Preferably 11.4
python=3.9.16
transformers>=4.29.2
faiss-gpu==1.7.2
torch==2.0.1 sentence-transformers==2.2.2
fastapi
uvicorn
NodeJS>=18.x
GPU memory 16 GB at least
Make sure your frontend port 3000 and backend port 2233 is available, or you can change them in main.ts and run.py

Installation

git clone https://github.com/JOHNNY-fans/NormPULSE.git
cd NormPULSE
conda create -n normllm python=3.9.16
conda activate normllm
pip install -r requirements.txt

Download Model
You can find the NormPULSE weights in the following huggingface repository.

In the retrieval step, we select the open-source M3E model as the text embedding model.

Usage
We provide a sample usage in a jupyter notebook usage_example.ipynb

Demo Setup

Here is our simple demo. image

Run Frontend

cd demo-frontend
npm i  
npm run dev

Run Backend

cd demo-backend
python run.py

šŸ›”ļø License

The code of this project is licensed under Apache 2.0, and the model weights are licensed under GNU AGPL 3.0. If the models contained in this project, or any modified versions thereof, are used in a service that results in misleading or harmful statements causing adverse effects, the responsibility lies with the service provider and is not associated with or attributable to this project.

šŸ™ Acknowledgement