Home

Awesome

LLM-NERRE - Structured Data Extraction

For the publication "Structured information extraction from scientific text with large language models" in Nature Communications by John Dagdelen*, Alexander Dunn*, Sanghoon Lee, Nicholas Walker, Andrew S. Rosen, Gerbrand Ceder, Kristin A. Persson, and Anubhav Jain.

* = Equal contribution

This repository contains code for extracting structured relational data as JSON documents from complex scientific text, with particular application to materials science. For the Llama-2 fine-tuned models and code, see the supplemetary nerre-llama repo.

Contents

General/MOF JSON models (general_and_mofs subdirectory):

Doping models (doping subdirectory):

Software requirements

Software requirements for specific python packages are given as requirements.txt files in each subdirectory (with required versions specified).

Installing and running the evaluation code

Specific instructions for each task are given in the subdirectories in the readme.md files. Running the scripts is done either through the command line (python <script_name.py> [options]) or through Jupyter notebook. Running the scripts does not require installation, but does require the packages in the requirements.txt files.

Demo and expected output are given in the readme.md files in each subdirectory. Expected runtimes are several seconds to several minutes.