Home

Awesome

Reliable Text-to-SQL on Electronic Health Records - Clinical NLP Workshop @ NAACL 2024

Overview

Electronic Health Records (EHRs) are relational databases that store the entire medical histories of patients within hospitals. They record numerous aspects of a patient's medical care, from admission and diagnosis to treatment and discharge. While EHRs are vital sources of clinical data, exploring them beyond a predefined set of queries or requests requires skills in query languages like SQL. To simplify access to EHR data, one straightforward strategy is to build a question-answering system, specifically leveraging text-to-SQL models that can automatically convert natural language questions into corresponding SQL queries and use the queries to retrieve answers. The goal of this shared task is to develop a reliable text-to-SQL model specifically tailored for EHRs that can accurately answer questions when certain and abstain from providing answers when uncertain, regardless of whether the questions are answerable or unanswerable.

This is part of the shared tasks at NAACL 2024 - Clinical NLP.

<p align="left" float="left"> <img src="image/logo.png" height="100" /> </p>

Timeline | Dataset | Evaluation | Baselines | Submission | Contact | Organizer

<a name="timeline"></a>Timeline

All deadlines are 11:59PM UTC-12:00 (Anywhere on Earth), unless stated otherwise

<a name="dataset"></a>Dataset

Statistics

#Train#Valid#Test
512411631167

Data Format

For the task, we have two types of files for each of the train, dev, and test sets: data files (with names like *_data.json) and label files (with names like *_label.json). Data files contain the input data for the model, and label files contain the expected model outputs that share the same 'id's as the corresponding data files (sample data).

Input Data (data.json)

{
  "version" : dataset version,
	"data" : [
	  {
		  "id" : sample identifier,
			"question" : natural langauge question (either answerable or unanswerable given the MIMIC-IV schema),	
	  },
	...		
	]
}

Each object in the data list consists of an ID and the corresponding natural language question.

Output Data (label.json)

{
  id -> sample identifier : label -> SQL query or 'null' if subject to abstention,
	...
}

Each object has a key of a sample's ID and a value of the corresponding label.

Table Schema

We follow the same table information style used in Spider. tables.json contains the following information for both databases:

{
    "column_names": [
      [
        -1,
        "*"
      ],      
      [
        0,
        "row id"
      ],
      [
        0,
        "subject id"
      ],
      ...
    ],
    "column_names_original": [
      [
        -1,
        "*"
      ],      
      [
        0,
        "row_id"
      ],
      [
        0,
        "subject_id"
      ],
      ...
    ],
    "column_types": [
      "text",
      "number",
      "number",
      ...
    ],
    "db_id": "mimic_iv",
    "foreign_keys": [
      [
        7,
        2
      ],
      ...
    ],
    "primary_keys": [
      1,
      6,
      ...
    ],
    "table_names": [
      "patients",
      "admissions",
      ...
    ],
    "table_names_original": [
      "patients",
      "admissions",
      ...
    ]
  }

Database

We use the MIMIC-IV database demo, which anyone can access the files as long as they conform to the terms of the Open Data Commons Open Database License v1.0. If you agree to the terms, use the bash command below to download the database.

wget https://physionet.org/static/published-projects/mimic-iv-demo/mimic-iv-clinical-database-demo-2.2.zip
unzip mimic-iv-clinical-database-demo-2.2
gunzip -r mimic-iv-clinical-database-demo-2.2

Once downloaded, run the code below to preprocess the database. This step involves time-shifting, value deduplication in tables, and more.

cd preprocess
bash preprocess.sh
cd ..

<a name="evaluation"></a>Evaluation

The scorer (scoring.py in the scoring_program module) will report the official evaluation score for the task. For more details about the metric, please refer to the Evaluation tab on the Codabench website.

<a name="baselines"></a>Baseline

We provide three sample baseline code examples on Colab as starters.

"Dummy" Model Sample Code

Generates 'null' for all predictions. This will mark all questions as unanswerable, and the reliability scores will match the percentage of unanswerable questions in the evaluation set.

Local Model Sample Code (T5)

Generates predictions using T5.

OpenAI Model Sample Code (ChatGPT)

Generates predictions using ChatGPT.

<a name="submission"></a>Submission

File Format

After saving your prediction file, compress (zip) it using a bash command, for example:

zip predictions.zip prediction.json

Submitting the File

Submit your prediction file on our task website on Codabench. For more details, see the Submission tab.

Extra (SQL Execution)

In this shared task, participants are required to submit their generated SQL queries, so execution is not necessary. However, if you are interested in checking the quality of your generated query, please refer to this Colab notebook: https://colab.research.google.com/drive/18mhDaKGlHPgLhZXx9V6EcPIQEBy5vbGd?usp=sharing. This note includes the simplest baseline (abstain-all) and postprocessing for queries and answers (e.g., decimal point rounding).

<a name="contact"></a>Contact

For more updates, join our Google group https://groups.google.com/g/ehrsql-2024/.

<a name="organizer"></a>Organizer

Organizers are from EdLab @ KAIST.