Home

Awesome

Ontologies Plus Annotations to Vectors: OPA2Vec

Introduction

OPA2Vec is a tool that can be used to produce feature vectors for biological entities from an ontology. OPA2Vec uses mainly metadata from the ontology in the form of annotation properties as the main source of data. It also uses formal ontology axioms as well as entity-concept associations as sources of information. This document provides instructions on how to run OPA2Vec as a tool and contains also a detailed documentation of the implementation of OPA2Vec for users willing to change the code according to their needs which is quite easy.

Pre-requisites

OPA2Vec implementation uses:

Running OPA2Vec

python runOPA2Vec.py -ontology "ontology file" -associations "association file" -outfile "output file" -embedsize N -windsize N -mincount N -model sg/cbow -pretrained "filename" -entities "filename" -annotations "URI1,URI2" -reasoner "elk/hermit" -debug "yes/no"

where the following are mandatory arguments:

If one of these two mandatory input files is missing, an error message will be displayed.

You can also specify the following optional arguments:

In more detail:

Mandatory input files

Optional parameters

Output

The script should store the obtained vector representations in the specified output file for all classes given in the "entities file" (or all classes if no file is provided). An example of what the output file should look like is shown in SampleVectors.lst.

Docker

A basic docker image of OPA2Vec is available at: https://hub.docker.com/r/kaustborg/opa2vec/

To run OPA2Vec on a docker container, follow the instructions below:

Create a folder /$PATH/data (where /$PATH/data is the absolute path to the data/ folder on your host machine ) and store in it your ontology file and association file. Pull opa2vec image using :

         docker pull kaustborg/opa2vec

Run image using the following command:

        docker run -v /$path/data:/opt/data kaustborg/opa2vec /opt/data/ontologyfile /opt/data/associationfile  -annotations "URI1,URI2" -pretrained "filename" -embedsize N -windsize N -mincount N -model sg/cbow  -entities "filename" -reasoner "elk/hermit" -debug "yes/no"

where ontologyfile is the name of your ontology file and associationfile is the name of your association file.

-Once the container finishes running, the vectors will be saved in the data/ folder on your host machine.

Reference

If you find our work useful, please cite:

OPA2Vec: combining formal and informal content of biomedical ontologies to improve similarity-based prediction. Fatima Zohra Smaili, Xin Gao, Robert Hoehndorf. Bioinformatics, 2018. https://doi.org/10.1093/bioinformatics/bty933

Related work

Please refer to the following for related work:

Final notes

For any comments or help needed with how to run OPA2Vec, please send an email to: fzohrasmaili@gmail.com