Home

Awesome

MatSci-LumEn: Materials Science Large Language Models Evaluation for text and data mining

Code, data, and results described in the paper "Mining experimental data from materials science literature with large language models: an evaluation study", https://www.tandfonline.com/doi/full/10.1080/27660400.2024.2356506

@article{foppiano2024mining,
    author = {Luca Foppiano, Guillaume Lambard, Toshiyuki Amagasa and Masashi Ishii},
    title = {Mining experimental data from materials science literature with large language models: an evaluation study},
    journal = {Science and Technology of Advanced Materials: Methods},
    volume = {0},
    number = {ja},
    pages = {2356506},
    year = {2024},
    publisher = {Taylor \& Francis},
    doi = {10.1080/27660400.2024.2356506},
    URL = {https://doi.org/10.1080/27660400.2024.2356506},
    eprint = {https://doi.org/10.1080/27660400.2024.2356506}
}

Evaluation summary

InformationTaskDatasetLinkEvaluation resultsEvaluation data
Material expressionsNERSuperMatGithubResultspredicted, expected
PropertiesNERMeasEvalGithubResultspredicted, expected
Materials -> properties extractionRESuperMatGithubResultspredicted, expected

Fine-tuning training data stored

Getting started

Set-up environment

conda create --name lumen python=3.9
conda activate lumen 
pip install -r requirements.txt 

Formula matching

The algorithm requires the material-parser project.

Scripts

Scripts must be run as python modules, using the parameter -m and the package path.

Processing

Formula matching evaluation

NER:

RE:

Evaluation

NER:

RE: