Home

Awesome

<h1 align="center"> The Impact of Large Language Models on Scientific Discovery:

A Preliminary Study using GPT-4

</h1> <div align="center"> <h3>Microsoft Research AI4Science, Microsoft Azure Quantum</h3>

</div>

Introduction

<p align="center"> <img src="overview.png" width="1000"> </p>

Large language models (LLMs) have showcased remarkable capabilities across a vast array of domains. In this report, we delve into the performance of LLMs within the context of scientific discovery, focusing on GPT-4. Our investigation spans a diverse range of scientific areas encompassing:

Our exploration methodology primarily consists of expert-driven case assessments, which offer qualitative insights into the model's comprehension of intricate scientific concepts and relationships, and occasionally benchmark testing, which quantitatively evaluates the model's capacity to solve well-defined domain-specific problems. Our preliminary exploration indicates that GPT-4 exhibits promising potential for a variety of scientific applications, demonstrating its aptitude for handling complex problem-solving and knowledge integration tasks. Broadly speaking, we evaluate GPT-4's knowledge base, scientific understanding, scientific numerical calculation abilities, and various scientific prediction capabilities.

โค๏ธContribution Invitation (call for participation)

Though we have initially conducted some preliminary experiments to explore the application prospects of LLM models in scientific discovery (case analyses can be found in our report). However, we recognize that this is still far from enough, and the potential of LLM models in this field remains to be tapped.

Therefore, we hope to advance the development of LLM in scientific discovery through the joint efforts of our community. We have created this GitHub repository to collect interesting findings and feedback on LLM's current capabilities from community members in their exploration process.

We invite all researchers interested in advancing the capabilities of large language models for scientific discovery applications. Whether you have relevant expertise or not, as long as you think LLM may have value in scientific discovery and hope this area can be further optimized, you are welcome to join us. Every sharing and suggestion will provide reference value for LLM applications in this field. Let us work together to explore new frontiers of LLM applications in scientific discovery!

Some goals of this ongoing project include:

โ—Important on how you can contribute

To contribute your discovery, for a template and the details on how to structure your contributions, please see the Discussions channel and here is a specific case a template.

We look forward to your participation and sharing!

Observation and Summary

We summarize what we have observed in our report to let you have some initial understanding of our evaluation. ๐Ÿ‘represents strength of the abilities, ๐Ÿ˜ต represents the abilities need to be improved.

Drug Discovery

Biology

Computational Chemistry

Materials Design

Partial Differential Equations

Related Survey/Evaluation

License

This repo is MIT-licensed.

Contact

If you have any questions or suggestions, look forward to your messages through the discussion channel or email llm4sciencediscovery@microsoft.com.

Contact people: Tao Qin (taoqin@microsoft.com), Lijun Wu (lijuwu@microsoft.com).

@misc{ai4science2023impact,
      title={The Impact of Large Language Models on Scientific Discovery: a Preliminary Study using GPT-4}, 
      author={Microsoft Research AI4Science and Microsoft Azure Quantum},
      year={2023},
      eprint={2311.07361},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}