Awesome

BioInstruct

🔬 Exciting breakthrough in BioNLP! 🧬

We're thrilled to introduce BioInstruct—a dataset enhancing LLMs like Llama with 25,000+ tailored instructions for biomedical tasks. Our research shows remarkable gains in question answering (QA), information extraction (IE), and text generation.

🌟 Highlights:

17.3% boost in QA accuracy
5.7% increase in IE F1 score
96% improvement in text generation tasks

By marrying instruction tuning with multi-task learning, our results also show that the performance gain is significantly higher when the LLM is instruction fine-tuned on closely related tasks.

For more details, please check out our paper.

Dataset

The BioInstruct dataset is available through huggingface dataset.

Citation Information

@article{Tran2024Bioinstruct,
    author = {Tran, Hieu and Yang, Zhichao and Yao, Zonghai and Yu, Hong},
    title = "{BioInstruct: instruction tuning of large language models for biomedical natural language processing}",
    journal = {Journal of the American Medical Informatics Association},
    pages = {ocae122},
    year = {2024},
    month = {06},
    issn = {1527-974X},
    doi = {10.1093/jamia/ocae122},
    url = {https://doi.org/10.1093/jamia/ocae122},
    eprint = {https://academic.oup.com/jamia/advance-article-pdf/doi/10.1093/jamia/ocae122/58084577/ocae122.pdf},
}

Contribute

Have a specific task and instruction you'd like an LLM to perform in a clinical setting? Raise a new issue here! Your contributions will aid in refining LLMs to be more effective and relevant in healthcare environments.