Home

Awesome

GraSR

Fast Protein Structure Comparison through Effective Representation Learning with Contrastive Graph Neural Networks

Zenodo: DOI

Preparation

GraSR is implemented with Python3, so a Python3 (>=3.6) interpreter is required.

At first, download the source code of GraSR from GitHub:

$ git clone https://github.com/chunqiux/GraSR.git

Then, we recommend you to use a virtual environment, such as virtualenv, to install the dependencies of GraSR. If virtualenv is not available in your OS, try to install it as following:

$ pip3 install virtualenv

Then, create and activate the virtual environment as following:

# Higher python version is permitted
$ virtualenv gsr_env -p python3.6
$ source gsr_env/bin/activate

Then, install the dependencies as following:

$ pip install -r requirements.txt

When you want to quit the virtual environment, just:

$ deactivate

Usage

Generate descriptors for query structures

If you only want to generate descriptors for query structures, you can run the command as following:

$ python main.py -m saved_models -q example -o result

where '-m saved_models' means the directory path of models is './saved_models', '-q example' means the directory path of query structures is './example' and '-o result' means the output directory path is './result'.

The format of descriptor file is .pkl. you can check it in python3 as following:

$ python3
>>> import pickle
>>> with open("result/query_descriptors.pkl", "rb") as qd_file:
...     d = pickle.load(qd_file)
...     print(d)

'd' is a dictionary. Its keys are filenames of query structures and corresponding values are descriptors (Numpy.ndarray).

Structure retrieval from a database

If you want to retrieve structural neighbors of the query structures, you can run the command as following:

$ python main.py -r -m saved_models -q example -k descriptors/scope_207_id40.pkl -o result

where '-r' means retrieval mode and '-k descriptors/scope_207_id40.pkl' means the file path of database is './descriptors/scope_207_id40.pkl'. An example of retrieval result is shown below:

Top-10 structural neighbors
sid			Length-scaling cosine distance
d1ca4a1.ent		0.33015
d1lb6a_.ent		0.33941
d3ivva1.ent		0.36020
d2edma1.ent		0.38801
d4ca1a_.ent		0.38843
d2ed6a1.ent		0.42102
d2g1da1.ent		0.43577
d1qhva_.ent		0.43767
d5gv0a1.ent		0.44182
d4akma_.ent		0.44263

where 'sid' denotes the sid of the structural neighbors in the SCOPe and 'Length-scaling cosine distance' denotes the distance between the query structure and structural neighbors.

License

Code License

The code of GraSR is under GPLv3.0.

The moco.py is modified from MoCo/builder.py, which is under CC-BY-NC 4.0 license. The details can be referred from MoCo_LICENSE.

Model Parameters License

<a rel="license" href="http://creativecommons.org/licenses/by/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by/4.0/80x15.png" /></a><br /> The GraSR parameters are made availabe under a <a rel="license" href="http://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International License</a>.

Online service

We also provide online retrieval service here. Our website follows a 'filter and refine' paradigm, which means it can provide more accurate result.