Home

Awesome

DeepIPW

1. Introduction

This repository contains source code and data description for paper "A deep learning framework for drug repurposing via emulating clinical trials on real world patient data". (accepted by Nature Machine Intelligence).

In this paper, we present an efficient and easily-customized framework for generating and testing multiple candidates for drug repurposing using a retrospective analysis of real world data (RWD). <img src="img/flowchart.png" width="60%"/>

Building upon well-established causal inference and deep learning methods, our framework emulates randomized clinical trials for drugs present in a large-scale medical claims database. <img src="img/LSTM.png" width="60%"/>

We demonstrate our framework on a coronary artery disease (CAD) cohort of millions of patients. We successfully identify drugs and drug combinations that significantly improve the CAD outcomes but not have been indicated for treating CAD, paving the way for drug repurposing.

2. System requirement

OS: Ubuntu 16.04

GPU: NVIDIA 1080ti (11GB memory) is minimum requirement. We recommend NVIDIA TITAN RTX 6000 GPUs.

3. Dependencies

Python 3.6
Pytorch 1.2.0
Scipy 1.3.1
Numpy 1.17.2
Scikit-learn 0.22.2

4. Preprocessing data

Dataset

The real world patient data used in this paper is MarketScan claims data. Interested parties may contact IBM for acquiring the data access at this link.

Data flow chart

The data flow chart of MarketScan claims data. <img src="img/MarketScan_DataFlow.png" width="70%"/>

Source: 2012 MarketScan® CCAE MDCR User Guide

Data files used

Input data demo

The demo of the input data can be found in the data folder, where the data structures and a synthetic demo of the inputs are provided. Before running the preprocessing codes, make sure the input data format is same to the provided input demo.

Cohort

The data structure for cohort table is as follows,

Column NameDescriptionNote
ENROLIDPatient enroll IDUnique identifier for each patient
Index_dateThe date of first CAD encounteri.e., min (ADMDATE [1st CAD admission date for the inpatient records],SVCDATE [1st CAD service date for the outpatient records])
DTSTARTDate of insurance enrollment startM/D/Y, e.g., 03/25/2732
DTENDDate of insurance enrollment endM/D/Y, e.g., 03/25/2732
Drug table

The data structure for the drug table is as follows,

Column NameDescriptionNote
ENROLIDPatient enroll IDUnique identifier for each patient
NDCNUMNational drug code (NDC)We map NDC to observational medical<br>outcomes partnership (OMOP) ingredient concept ID, and obtain 1,353 unique drugs
SVCDATEDate to take the prescriptionM/D/Y, e.g., 03/25/2732
DAYSUPPDays supply. The number of days of drug therapy covered by this prescriptionDay, e.g., 28
Inpatient table

The data structure for the inpatient table is as follows,

Column NameDescriptionNote
ENROLIDPatient enroll IDUnique identifier for each patient
DX1-DX15Diagnosis codes. International Classification of Diseases (ICD) codes57,089 ICD-9/10 codes considered in the dataset. Dictionary for ICD-9 and ICD-10 codes.
DXVERFlag to denote ICD-9/10 codes“9” = ICD-9-CM and “0” = ICD-10-CM
ADMDATEAdmission date for this inpatient visitM/D/Y, e.g., 03/25/2732
DaysThe number of days stay in the inpatient hospitalDay, e.g., 28
Outpatient table

The data structure for the outpatient table is as follows,

Column NameDescriptionNote
ENROLIDPatient enroll IDUnique identifier for each patient
DX1-DX4Diagnosis codes. International Classification of Diseases (ICD) codes57,089 ICD-9/10 codes considered in the dataset. Dictionary for ICD-9 and ICD-10 codes.
DXVERFlag to denote ICD-9/10 codes“9” = ICD-9-CM and “0” = ICD-10-CM
SVCDATEService date for this outpatient visitM/D/Y, e.g., 03/25/2732
Demographics

The data structure for demo table is as follows,

Column NameDescriptionNote
ENROLIDPatient enroll IDUnique identifier for each patient
DOBYRbirth yearYear, e.g., 2099
SEXgender1- male; 2- female

Preprocess drug tables

cd preprocess
python pre_drug.py --input_data_dir ../data/synthetic/drug --output_data_dir 'pickles/cad_prescription_taken_by_patient.pkl'

Preprocess patient cohort

# Note: Here's just a demo case for parameter selection. They can be easily adjusted for different application scenario. 
cd preprocess
python run_preprocess.py --min_patients 10 --min_prescription 2 --followup 60 --time_interval 240 --baseline 10 --input_data ../data/synthetic --save_cohort_all save_cohort_all/

Parameters

5. DeepIPW model

Bash command

bash run_lstm.sh

Python command

cd deep-ipw
python main.py

Parameters