

Tutorial: Semantic Similarity and Machine Learning with Ontologies

Preparations for JOWO 2019

Please follow these steps before the tutorial:

You can then start the notebooks during the tutorial like this:

JOWO 2019 Slides

Past events

Parts of the material in this repository were taught at


Ontologies have long provided a core foundation in the organization of biomedical entities, their attributes, and their relationships. With over 500 biomedical ontologies currently available there are a number of new and exciting new opportunities emerging in using ontologies for large scale data sharing and data analysis. This tutorial will help you understand what ontologies are and how they are being used in computational biology and bioinformatics.

Intended audience and level: The tutorial will be of interest to any researcher who will use or produce large structured datasets in computational biology. The tutorial will be at an intermediate level and will describe current research directions and challenges. A particular focus will be given on the use of ontologies to compute semantic similarity, and the use of ontologies in machine learning.

Learning objectives

This is an intermediate-level course to ontologies and ontology-based data analysis in bioinformatics. In this tutorial, participants will learn:

Before the tutorial (important)

The tutorial will contain a hands-on part. If you want to participate (instead of just watching the presentation), please install the required software locally or use our Docker image (preferred/faster).

Local installation on your computer:

Download and install Jupyter Notebook (http://jupyter.org/) with a SciJava kernel (follow instructions here), and run the first cell in https://github.com/bio-ontology-research-group/ontology-tutorial/raw/master/ontology-analysis.ipynb (on Jupyter). This will download the required dependencies (OWLAPI, ELK, SML) which are quite large. You must also download our data package from here, here and for the last part of the tutorial some vectors from here.

It is fine to skip this step and still follow the tutorial, but if you want to play with the methods yourself, and go away with some running code examples that you can build on, downloading and running the code is necessary.

Detailed instructions:

Using the Docker image:

To copy a file from or to a running docker image:

To copy a file from the running docker image to you local (host) computer, please use the following recepie:

  1. Find out the ID of the running container: Exceute docker container list in a terminal. The result should look similar to the following:
CONTAINER ID        IMAGE                                                COMMAND                  CREATED             STATUS              PORTS                    NAMES
0cd1f8da3c1f        altermeister/bio-ontology-ontology-tutorial-docker   "/usr/bin/tini -- /b…"   9 seconds ago       Up 7 seconds>8888/tcp   pedantic_mendeleev
  1. Copy the corresponding container ID (in this example 0cd1f8da3c1f) and in a terminal copy a file using the following command:

docker cp 0cd1f8da3c1f:/home/bioonto/ontology-tutorial/phenomenet-inferred.owl .

The command has the following form:

docker cp <CONTAINER ID>:<SRC_PATH> <DEST_PATH> or docker cp <SRC_PATH> <CONTAINER ID>:<DEST_PATH:. The first copies a file from the container to the destination directory, while the other copies a file from the source to a destination into the running container.


  1. General overview: what are ontologies, where to find them (ontology portals), how they are used (for annotation)
  2. Semantic Web: basic technologies underlying ontologies; understanding ontologies through OWL
  3. Ontologies and graphs: how to go from ontologies to graphs and back (preliminary step for computing semantic similarity)
  4. Semantic similarity: computing similarity between classes, sets of classes, and between biological entities (genes, diseases, drugs)
  5. Machine learning and ontologies: using deep learning to encode knowledge graphs, ontologies, and connections between ontologies
  6. Applications: how to apply the methods for biomedical data analysis: finding protein-protein interaction, prioritize disease genes, and more

Reading materials



Slides will be updated on demand. The latest version (source and PDF) are in the (/slides/) folder. There are introduction slides (https://github.com/bio-ontology-research-group/ontology-tutorial/blob/master/slides/introduction-slides-pns.pdf by Paul Schofield and https://github.com/bio-ontology-research-group/ontology-tutorial/blob/master/slides/2018-ismb-tutorial-part-1.pdf by Michel Dumontier) and more technical slides (https://github.com/bio-ontology-research-group/ontology-tutorial/blob/master/slides/all.pdf by Robert Hoehndorf).

Questions and Requests

You can use the Issue Tracker for questions and requests.



The tutorial materials are under a CC-BY license.