Home

Awesome

<p align="center"> <img width="500px" alt="AlpaCare" src="plots/logo.png"> </p> <!-- <p align="center"><a href="https://arxiv.org/pdf/2310.14558v1.pdf">[📄 Paper]</a> </p> -->

AlpaCare: Instruction-tuned Large Language Models for Medical Applications

<hr>

Code License Data License Weight License

This is the repo for AlpaCare, which are LLMs tuned on medical instructions. The repo contains:

Overview

AlpaCare models contain 4 models (7B/13B - LLaMA[1]/LLaMA-2[2]) tuned on a 52k medical instruction-following dataset MedInstruct-52k, following Alpaca[3] and Self-Instruct[4]. You can find our model weights at:

VersionLink
AlpaCare -LLaMA_7Bhttps://huggingface.co/xz97/AlpaCare-llama1-7b
AlpaCare -LLaMA2_7Bhttps://huggingface.co/xz97/AlpaCare-llama2-7b
AlpaCare -LLaMA_13Bhttps://huggingface.co/xz97/AlpaCare-llama-13b
AlpaCare -LLaMA2_13Bhttps://huggingface.co/xz97/AlpaCare-llama2-13b

To set up a conda environment for data generation/model training, please do:

pip install -r requirements.txt

[1]: LLaMA: Open and Efficient Foundation Language Models. Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, Aurelien Rodriguez, Armand Joulin, Edouard Grave, Guillaume Lample. https://arxiv.org/abs/2302.13971v1

[2] Llama 2: Open foundation and fine-tuned chat models. Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, Dan Bikel, Lukas Blecher, Cristian Canton Ferrer, Moya Chen, Guillem Cucurull, David Esiobu, Jude Fernandes, Jeremy Fu, Wenyin Fu, Brian Fuller, Cynthia Gao, Vedanuj Goswami, Naman Goyal, Anthony Hartshorn, Saghar Hosseini, Rui Hou, Hakan Inan, Marcin Kardas, Viktor Kerkez, Madian Khabsa, Isabel Kloumann, Artem Korenev, Punit Singh Koura, Marie-Anne Lachaux, Thibaut Lavril, Jenya Lee, Diana Liskovich, Yinghai Lu, Yuning Mao, Xavier Martinet, Todor Mihaylov, Pushkar Mishra, Igor Molybog, Yixin Nie, Andrew Poulton, Jeremy Reizenstein, Rashi Rungta, Kalyan Saladi, Alan Schelten, Ruan Silva, Eric Michael Smith, Ranjan Subramanian, Xiaoqing Ellen Tan, Binh Tang, Ross Taylor, Adina Williams, Jian Xiang Kuan, Puxin Xu, Zheng Yan, Iliyan Zarov, Yuchen Zhang, Angela Fan, Melanie Kambadur, Sharan Narang, Aurelien Rodriguez, Robert Stojnic, Sergey Edunov, and Thomas Scialom. https://arxiv.org/abs/2307.09288

[3]: Stanford Alpaca: An Instruction-following LLaMA Model.Rohan Taori and Ishaan Gulrajani and Tianyi Zhang and Yann Dubois and Xuechen Li and Carlos Guestrin and Percy Liang and Tatsunori B. Hashimoto. https://crfm.stanford.edu/2023/03/13/alpaca.html

[4]: Self-Instruct: Aligning Language Model with Self Generated Instructions. Yizhong Wang, Yeganeh Kordi, Swaroop Mishra, Alisa Liu, Noah A. Smith, Daniel Khashabi, Hannaneh Hajishirzi. https://arxiv.org/abs/2212.10560

Data Release

MedInstruct contains:

</details>

Data Generation Process

We following the data generation pipeline from self-instruct and Alpaca, and modified several points: Instructions:

Responses:

Here's an illustration of an instruction-response pair:

{
instruction: "Rewrite this instruction on how to use an inhaler to be more patient-friendly and less technical."
input: "Exhale deeply, engage mouthpiece of inhaler, depress canister while simultaneously inhaling long and deep, hold the breath for ten seconds, then gently exhale"
output: "Here's a more patient-friendly version of the instruction on how to use an inhaler:
1. Take a deep breath and completely breathe out. 
2. Put the mouthpiece of the inhaler into your mouth and seal your lips around it properly.
3. Press down the canister while breathing in slowly and deeply at the same time. 
4. Hold your breath for ten seconds. 5. Finally, breathe out gently."
}

To generate the data:

Please check task generation script:

sh task_output_generation/task_generation.sh

and output generation script:

sh task_output_generation/output_generation.sh

In instruction data analysis, we demonstrate the diversity of the MedInstruct-52K in terms of:

(a) Instruction Language: The inner circle displays the 20 most frequent root verbs, while the outer circle showcases the top 4 associated noun objects from the generated instructions. Although there is a wide range, only 22% of the instructions are covered, as others do not adhere to the verb-noun format.

<p align="center"> <img src="plots/datastats.png" width="500" alt="Instruction Language Analysis" /> </p>

(b) View: The top 20 frequent views from various medical personnel constitute 55% of MedInstruct-52K.

<p align="center"> <img src="plots/viewstats.png" width="500" alt="View Analysis" /> </p>

(c) Task Types The top 20 covered in MedInstruct-52K. Existing medical instruction-tuned models only focus on question-answering and doctor-patient conversation tasks.

<p align="center"> <img src="plots/typestats.png" width="500" alt="Task Types Analysis" /> </p>

Fine-tuning AlpaCare

We follows Alpaca prompt to fine-tune LLaMA series models and use standard Hugging Face training code.

For the instruction-finetuning of LLaMA/LLaMA-7B:

sh training/train_7b.sh

For the instruction-finetuning of LLaMA/LLaMA-13B:

sh training/train_13b.sh

Experiments

We compare AlpaCare with several instruction-tuned LLMs based on the LLaMA models, across different scales and with various tuning datasets. Free-form instruction evaluations are conducted by evaluating on iClinq, a patient-doctor conversation set, and a medical instruction test set crafted by our clinicians (MedInstruct-test). To further evaluate the generalization ability, we use a general domain test set, AlpacaFarm.

AlpaCare shows strong medical capacity and generalization ability compared to baselines on both 7B and 13B scales. We follow AlpacaFarm to utilize gpt-turbo-3.5 as the judge for the comparison. We compare each instruction-tuned model with 4 distinct reference models: text-davinci-003, gpt-3.5-turbo, gpt-4, and claude-2, respectively.

<p align="center"> <img src="plots/7b-model-results-1.png" width="500" alt="Task Types Analysis" /> </p> <p align="center"> <img src="plots/13b-model-results.png" width="500" alt="Task Types Analysis" /> </p>

We provide all the reference model output and instcution-tunned model ouput

Citation:

If you think it is a useful repo, please cite the paper:

@misc{zhang2023alpacareinstructiontuned,
      title={AlpaCare:Instruction-tuned Large Language Models for Medical Application}, 
      author={Xinlu Zhang and Chenxin Tian and Xianjun Yang and Lichang Chen and Zekun Li and Linda Ruth Petzold},
      year={2023},
      eprint={2310.14558},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}