Home

Awesome

Materials for Machine Learning with Ontologies

This repository contains all the materials for our "Machine learning with biomedical ontologies" manuscript. We provide the Jupyter Notebooks to reproduce our experimental results and the benchmark datasets based on predicting protein-protein interactions. Furthermore, we make a set of slides available (as PDF and source code in LaTeX Beamer) that may be useful for teaching or presentations.

Notebooks

We provide several Jupyter notebooks. The notebooks include:

PPI Benchmark

We provide two benchmark datasets for protein--protein interaction prediction task. The datasets can be downloaded using the following link: DOI

Two benchmark datasets for evaluating machine learning methods on the task of predicting protein--protein interaction networks. The original data was downloaded from StringDB database of protein--protein interactions and Gene Ontology Resource. This archive includes:

We filter out interactions with confidence score less than 700 and consider them to be symmetric. We randomly split the datasets into 80/20% training/testing sets by the number of interactions and use 20% of the training set as a validation set.

Dependencies

Please install the following software to run our notebooks:

Running the notebooks

Run jupyter notebook and then open the notebook files.

Current benchmark results (yeast)

MethodRaw Hits@10Filtered Hits@10Raw Hits@10Filtered Hits@100Raw Mean RankFiltered Mean RankRaw AUCFiltered AUC
TransE0.060.130.320.401125.41074.80.820.83
SimResnik0.090.170.380.48757.8706.90.860.87
SimLin0.080.150.330.41875.4824.50.840.85
SiameseNN0.060.170.460.68674.27622.200.890.90
SiameseNN (Ont)0.080.190.500.72543.56491.560.910.92
EL Embeddings0.080.170.440.62451.29394.040.920.93
Onto2Vec0.080.150.350.48641.1587.90.790.80
OPA2Vec0.060.130.390.58523.3466.60.870.88
Random walk0.060.130.310.40612.6587.40.870.88
Node2Vec0.070.150.360.46589.1522.40.870.88

Current benchmark results (human)

MethodRaw Hits@10Filtered Hits@10Raw Hits@10Filtered Hits@100Raw Mean RankFiltered Mean RankRaw AUCFiltered AUC
TransE0.050.110.240.293960.43890.60.780.79
SimResnik0.050.090.250.301933.61864.40.880.89
SimLin0.040.080.200.232287.92218.70.860.87
SiameseNN0.050.150.410.641881.101808.770.900.89
SiameseNN (Ont)0.050.130.380.591838.311766.340.890.89
EL Embeddings0.010.020.220.261679.721637.650.900.90
Onto2Vec0.050.080.240.312434.62391.20.770.77
OPA2Vec0.030.070.230.261809.71767.60.860.88
Random walk0.040.100.280.341942.61958.60.850.86
Node2Vec0.030.070.220.281860.51813.10.860.87

Adding to the benchmark

To add your own results to the benchmark, please send us a pull request with a link to the source repository that contains the code to reproduce the results. Alternatively, please create an issue on the issue tracker and we will add your results.

Slides

We provides slides that can be used to present some of this work. The slides have been created as part of an Ontology Tutorial that was developed and taught over several years at various events. All methods in the slides are also implemented with examples in our Jupyter Notebooks.

  1. Introduction
  2. Ontologies and Graphs -- basic introduction to ontologies, Description Logic, and how they can give rise to graph-based representations
  3. Semantic Similarity -- different semantic similarity measures on ontologies
  4. Ontology Embeddings -- methods to generate embeddings for ontologies, including syntactic, graph-based, and model-based approaches.

Resources

Processing and pre-processing ontologies

Computing entailments, reasoning

Generating graphs from ontologies

Computing Semantic Similarity

Embedding graphs

Embedding axioms

Ontology-based constrained learning:

Publication

If you like our work, please cite our paper:

@article{machine-learning-with-ontologies,
    author = {Kulmanov, Maxat and Smaili, Fatima Zohra and Gao, Xin and Hoehndorf, Robert},
    title = {Semantic similarity and machine learning with ontologies},
    journal = {Briefings in Bioinformatics},
    year = {2020},
    month = {10},
    abstract = {Ontologies have long been employed in the life sciences to formally represent and reason over domain knowledge and they are employed in almost every major biological database. Recently, ontologies are increasingly being used to provide background knowledge in similarity-based analysis and machine learning models. The methods employed to combine ontologies and machine learning are still novel and actively being developed. We provide an overview over the methods that use ontologies to compute similarity and incorporate them in machine learning methods; in particular, we outline how semantic similarity measures and ontology embeddings can exploit the background knowledge in ontologies and how ontologies can provide constraints that improve machine learning models. The methods and experiments we describe are available as a set of executable notebooks, and we also provide a set of slides and additional resources at https://github.com/bio-ontology-research-group/machine-learning-with-ontologies.},
    issn = {1477-4054},
    doi = {10.1093/bib/bbaa199},
    url = {https://doi.org/10.1093/bib/bbaa199},
    note = {bbaa199},
    eprint = {https://academic.oup.com/bib/advance-article-pdf/doi/10.1093/bib/bbaa199/33875255/bbaa199.pdf},
}