Important Notice Regarding the Use of DrugGPT

DrugGPT is an advanced tool developed for educational and research purposes. It is crucial to understand the following points regarding its usage:

By using DrugGPT, you acknowledge and agree to these terms and conditions. It is imperative to consult with a qualified healthcare provider for any health-related questions or concerns.

Full Model

To access the full model, visit our demo DrugGPT Demo.

Instruction on how to use the demo for drug analysis and inquiry

  1. There are 4 modes accessible for downstream tasks:
    1. General: This mode is intended for general drug inquiry. User is prompted to input symptom, disease (if diagnosed) and medication info (if prescribed). The model will generate information about the drug, including its name, usage, side effects, etc. This model is recommended for general conversation about drug and disease.
    2. Multiple Choice: This mode is intended for drug related multiple choice questions. User is prompted to input the question and the options. The model will generate the answer to the question. This mode is not recommended for continuous conversation but for accurate, evidence-based MC Q&A.
    3. Yes/No: This mode is intended for drug related yes/no questions. User is prompted to input the question. The model will generate the answer to the question. This mode is not recommended for continuous conversation but for accurate, evidence-based binary Q&A.
    4. Text Q&A: This mode is intended for drug related text Q&A. User is prompted to input the question. The model will generate the answer to the question. This mode is not recommended for continuous conversation but for accurate, evidence-based text Q&A.
  2. After selecting the desired mode and inputting the information, click the 'Submit' button at the bottom of the form to initiate the conversation.
  3. DrugGPT should never be used as medical consultant at the current stage. Please consult to licensed medical professionals for any medical advice.

Demos on downstream tasks

The demo videos showing DrugGPT performing downstream tasks are available at:

  1. Multiple Choice Q&A
  2. Drug and Dosage Recommendation
  3. Adverse Reaction
  4. Drug-drug Interaction
  5. Pharmacology Q&A
  6. Generalization Study

Clone the repo

git clone https://github.com/AI-in-Health/DrugGPT.git

# clone the following repo to calculate automatic metrics
cd DrugGPT
git clone https://github.com/ruotianluo/coco-caption.git 

Codebase structure

DrugGPT/ # the root of the repo
    ├── README.md
    ├── _init_.ipynb # scripts for logging, loading, etc.
    ├── configs
    │   ├── finetune.yaml      # config file for fine-tuning
    │   ├── methods.yaml       # config file for methods
    │   ├── model.yaml         # config file for llama and soft prompt models
    │   └── train.yaml         # config file for training
    ├── data
    │   └──source.md          # links to the source datasets and preprocessed datasets
    ├── notebooks              # Folder for notebooks
    │   └── evaluation.ipynb   # Notebook for evaluation of benchmark models
    ├── data
    │   ├── data_loader.py     # scripts for loading data
    ├── ensemble
    │   ├── ensemble_model.py  # the ensemble model structure
    ├── evaluation
    │   ├── evaluation_metrics.py # script for evalaution
    ├── gcn
    │   ├── dsdg.py # contains code for generating dsdg graph
    │   ├── gcn_model.py # gcn model used to obtain the graph embedding of dsdg
    ├── llama
    │   ├── llama_utils.py # the llama model and the soft prompt
    ├── prompt
    │   ├── prompt_manager.py # manages hard prompts
    ├── prompt_tuning
    │   ├── soft_prompt_tuning.py # fine-tuning soft prompt
    ├── utils
    │   ├── basic.py # basec container
    │   ├── checkpointer.py # checkpointer
    │   ├── train.py # fine-tuning
    │   ├── language_model.py # language model
    │   ├── optim.py # optimizer
    │   ├── parser.py # parser for different types of outputs
    │   └── scheduler.py # scheduler
    └── drugGPT_eval # script for evaluating DrugGPT


conda create -n pi python==3.9
conda activate pi
pip install torch==1.7.1+cu101 torchvision==0.8.2+cu101 torchaudio==0.7.2 -f https://download.pytorch.org/whl/torch_stable.html
pip install transformers==4.34.0
pip install langchain==0.0.314
pip install pytorch-lightning==1.5.1
pip install pandas rouge scipy
pip install networkx==2.5.1
pip install torch_geometric==1.7.2
pip install nltk
pip install tqdm
pip install openai==0.28.1
pip instal installed tiktoken==0.5.1
pip install huggingface-hub==0.17.3 
pip install safetensors==0.4.0 
pip install sentence-transformers==2.2.2 
pip install sentencepiece==0.1.99 
pip install tokenizers==0.14.1
pip install accelerate==0.23.0
pip install einops==0.7.0
pip install re
pip install pandas

# if you want to re-produce our data preparation process
pip install scikit-learn plotly

Higher version of torch and cuda can also work.

Download the data

The source data can be accessed at:

  1. MedQA-USMLE: GitHub | PapersWithCode
  2. MedMCQA: MedMCQA Homepage
  3. MMLU-Medicine: Hugging Face Datasets
  4. ChatDoctor: GitHub
  5. ADE-Corpus-v2: Hugging Face Datasets
  6. Drug-Effects: Kaggle
  7. DDI-Corpus: Hugging Face Datasets
  8. PubMedQA: PubMedQA Homepage

The preprocessed data is too large for GitHub, you can download the pre-processed data from Google Drive (please send us an email for access).

Data Preprocessing Process

The data preprocessing for DrugGPT involves several crucial steps to ensure the quality and relevance of the data used for training and fine-tuning the model. Below is an overview of the process:

  1. Data Cleaning:
    • Remove duplicate and contaminated data to ensure the uniqueness and purity of the dataset.
  2. Relevance Filtering:
    • For datasets not entirely drug-related, irrelevant data is filtered out to maintain focus on drug-related content.
  3. Data Organization:
    • Organize the data into columns for queries, answers, and explanations (if available). The explanation column is particularly useful for hallucination assessment during model training.
  4. Expert Review:
    • Conduct a manual inspection with medical experts to verify that the data quality aligns with drug analysis processes in medical settings.
  5. Evaluation Data Storage:
    • Store the preprocessed files in CSV formats, tagged with either _data or _answer to indicate their content type.
  6. Finetuning Sample Collection:
    • Collect 1000 data samples curated from various datasets, including PubmedQA, MedMCQA, ADE-Corpus-V2, DDI-corpus, and Drug-Effects. These datasets cover the five downstream tasks of DrugGPT.
  7. Preparation for Knowledge-based Instruction Prompt Tuning:
    • Store the 1000 data samples specifically prepared for Knowledge-based Instruction Prompt Tuning, a novel process based on PEFT (refer to PEFT paper) modified to incorporate our DSDG graph in the KA-LLM inference.
  8. Finetuning Data Storage:
    • Randomly sample the data into three distinct datasets: FT1, FT2, and FT3.csv, to provide diverse training scenarios.
  9. Data Storage Location:
    • All prepared datasets are stored in the data folder within the project structure.

This meticulous preprocessing ensures that DrugGPT is trained on high-quality, relevant data, laying a strong foundation for accurate and reliable drug-related predictions and analysis.


The training is only applicable to the finetuning the KA-LLM (Knowledge Acquisition) model. Which is a component of the ensembled DrugGPT specialized in locating specific knowledge from the DSDG (Drug and Symptom Disease Graph). The training is intended to align the features in DSDG with the natural language input which KA-LLM takes as the input for downstream tasks.

Fine-Tuning Hyperparameters

Below is a table of the hyperparameters used for fine-tuning the Knowledge Acquisition Language Model (KA-LLM):

ModelLLaMA-7BThe base language model used for fine-tuning.
Soft Prompt Length100Length of the soft prompt used in tuning.
Epochs20Number of training epochs.
Learning Rate1e-3Learning rate for the optimizer.
OptimizerAdamWThe optimization algorithm used.
Weight Decay0.01Weight decay parameter for the optimizer.
Frozen LLM ParametersTrueIndicates if the LLM parameters are kept frozen.
Number of Data Samples1000Total number of data samples used for tuning.
Data Sample Distribution200 per datasetEach dataset contributes 200 samples.
Datasets UsedVariousIncludes PubmedQA, MedMCQA, ADE-Corpus-V2, DDI-corpus, and Drug-Effects.
τ (Tau)0.1Hyperparameter for DSDG edge weight calculations.
K5Hyperparameter for DSDG edge weight calculations.

These settings were selected to optimize the performance of DrugGPT on various downstream tasks while considering computational efficiency.


Here are some key argument to run train.py:


Example usage: To access the train.py script, run the following command:

cd src
cd utils

To run the training process, use the following command:

python3 train.py --dataset FT1 --train_file path/to/FT1_train.xml --val_file path/to/FT1_val.xml --config configs/model.yaml --output_root output/FT1_training

Model Parameters

The model parameters are available at Google Drive.



Hugging Face API for running inference with DrugGPT, which is built upon the LLaMA architecture. Please refer to Hugging Face API for more details. OpenAI key if you plan to use the latest GPT models for conversational generation. Please refer to the OpenAI API. For one-shot generation, we recommend set use_open_ai to false as OpenAI is not a necessary component for DrugGPT. The LLaMA implementation can be accessed in the LLaMA GitHub repo, however, it might be computational expensive to run the inference. If you decide to use the llama inference api instead of local model, here is the link to require access, in addition to the Hugging Face API key.


Here are some key argument to run drugGPT_eval.py:


To evaluate the model, use the following command:

python3 drugGPT_eval.py \
  --openai_key YOUR_OPENAI_API_KEY \
  --excel_path path/to/DSDG_excel.xlsx \
  --dataset_name pubmedqa \
  --evaluation_set_path path/to/pubmedqa_evaluation_data.csv \
  --log_results \

Baseline models

To evaluate other models, use the template provided in notebooks/evaluation.ipynb.

Bugs or Questions?

If you encounter any problems when using the code, or want to report a bug, you can open an issue or email {hongjian.zhou@cs.ox.ac.uk, fenglin.liu@eng.ox.ac.uk}. Please try to specify the problem with details so we can help you better and quicker!