Home

Awesome

Python CI

Material Parsers (and other tools)

Previously this project was released as grobid-superconductors-tools, born as aister project of grobid-superconductors containing a web service that interfaces with the python libraries (e.g. Spacy).

The service provides the following functionalities:

Usage

The service is deployed on huggingface spaces, and can be used right away. For installing the service in your own environment see below.

Convert material name to formula

curl --location 'https://lfoppiano-grobid-superconductors-tools.hf.space/convert/name/formula' \
--form 'input="Hydrogen"'

output:

{"composition": {"H": "1"}, "name": "Hydrogen", "formula": "H"}

Decompose formula in a structured dict of elements

Example:

curl --location 'https://lfoppiano-grobid-superconductors-tools.hf.space/convert/formula/composition' \

--form 'input="CaBr2-x"'

output:

{"composition": {"Ca": "1", "Br": "2-x"}}

Classify materials in classes

Example:

curl --location 'https://lfoppiano-grobid-superconductors-tools.hf.space/classify/formula' \
--form 'input="(Mo 0.96 Zr 0.04 ) 0.85 B x "'

output:

['Alloys']

Process material

This process includes a combination of everything listed above, after passing the material sequence through a DL model

Example:

curl --location 'https://lfoppiano-material-parsers.hf.space/process/material' \
--form 'text="(Mo 0.96 Zr 0.04 ) 0.85 B x "'

output:

[
    {
        "formula": {
            "rawValue": "(Mo 0.96 Zr 0.04 ) 0.85 B x"
        },
        "resolvedFormulas": [
            {
                "rawValue": "(Mo 0.96 Zr 0.04 ) 0.85 B x",
                "formulaComposition": {
                    "Mo": "0.816",
                    "Zr": "0.034",
                    "B": "x"
                }
            }
        ]
    }
]

Evaluation

The model uses DeLFT's model BidLSTM_CRF.

Evaluated on the 23/12/25

                  precision    recall  f1-score   support

        <doping>     0.6926    0.6377    0.6640       265
   <fabrication>     0.3333    0.0909    0.1429        44
       <formula>     0.8348    0.8459    0.8403      2569
          <name>     0.7346    0.7935    0.7629       949
         <shape>     0.9089    0.9608    0.9341       841
     <substrate>     0.5875    0.3176    0.4123       148
         <value>     0.8844    0.8920    0.8882       463
      <variable>     0.9645    0.9710    0.9677       448

all (micro avg.)     0.8321    0.8385    0.8353      5727

Installing in your environment

docker run -it lfoppiano/grobid-superconductors-tools:2.1

References

If you use our work, and write about it, please cite our paper:

@article{doi:10.1080/27660400.2022.2153633,
    author = {Luca Foppiano and Pedro Baptista Castro and Pedro Ortiz Suarez and Kensei Terashima and Yoshihiko Takano and Masashi Ishii},
    title = {Automatic extraction of materials and properties from superconductors scientific literature},
    journal = {Science and Technology of Advanced Materials: Methods},
    volume = {3},
    number = {1},
    pages = {2153633},
    year = {2023},
    publisher = {Taylor & Francis},
    doi = {10.1080/27660400.2022.2153633},
    URL = {
    https://doi.org/10.1080/27660400.2022.2153633
    },
    eprint = {
    https://doi.org/10.1080/27660400.2022.2153633
    }
}

Overview of the repository

Developer's notes

Set up on Apple M1

conda install -c apple tensorflow-deps
pip install -r requirements.macos.txt 
conda install scikit-learn=1.0.1

We need to remove tensorflow, h5py, scikit-learn from the delft dependencies in setup.py

pip install -e ../../delft 
pip install -r requirements.txt 

Finally, don't forget to install the spacy model

python -m spacy download en_core_web_sm

Release

bump-my-version bump patch|minor|major