Awesome
kg-cookiecutter
A cookiecutter template for KG generation.
Getting started
First, install the cruft package. Cruft enables keeping projects up-to-date with future updates made to this original template.
pip install cruft
Next, create a project using the sphintoxetry-cookiecutter
template.
cruft create https://github.com/Knowledge-Graph-Hub/kg-cookiecutter
This kickstarts an interactive session where you declare the following:
project_name
: Name of the project. [defaults to: Project_X]project_description
: Description of the project. [defaults to: This is the project description.].full_name
: Your name [defaults to: Harshad Hegde]email
: your email [defaults to: hhegde@lbl.gov]license
: Choose one from [MIT
,BSD-3
,GNU GPL v3.0
,Apache Software License 2.0
] [defaults to:MIT
]- ⚠️
github_token_for_doc_deployment
: The github token variable name for document deployment usingSphinx
. [defaults to:GH_TOKEN
]
:warning: Do NOT enter actual token here, this is just the variable name that holds the token value in the project repository's Secrets.
This will generate the project folder abiding by the template configuration specified by kg-cookiecutter
in the cookiecutter.json
file.
What does this do?
The following files and directories are autogenerated in the project:
- Github wokflows:
- For code quality checks (
qc.yml
) - Documentation deployment (
deploy-docs.yml
)
- For code quality checks (
docs
directory withSphinx
configuration files and anindex.rst
file.project_name
directory- Within the
project_name
directory, there are 4 python files:merge_utils
directory:kg merge
functionality.transform_utils
directory:kg transform
functionality.atc_transform
: An example for transforming data resources usingkoza
.example_transform
: A basic example for the transform architecture.ontology_transform
: Sample transform for ontology files.
download.py
: Runs the download of resources fromdownload.yaml
.query.py
: SPARQL query functionality.run.py
: The main driver for CLI commands.transform.py
: Connects resource to Transform implementation.
- Within the
templates
directory: Builds of the KG.tests
directory with a very basic test.download.yaml
: All configurations for downloading resourcespoetry
compatiblepyproject.toml
file containing minimal package requirements.tox.ini
file containing configuration for:coverage-clean
lint
flake8
mypy
docstr-coverage
pytest
LICENSE
file based on the choice made during setup.README.md
file containingproject_description
value entered during setup.
Further setup
Install poetry
Install poetry
if you haven't already.
pip install poetry
Install dependencies
poetry install
Add poetry-dynamic-versioning
as a plugin
poetry self add "poetry-dynamic-versioning[plugin]"
Note: If you are using a Linux system and the above doesn't work giving you the following error Invalid PEP 440 version: ...
, you could alternatively run:
poetry add poetry-dynamic-versioning
Run tox
to see if the setup works
poetry run tox
This should run all the bullets mentioned above under the tox
configuration and ideally you should see the following at the end of the run:
coverage-clean: commands succeeded
lint: commands succeeded
flake8: commands succeeded
mypy: commands succeeded
docstr-coverage: commands succeeded
py: commands succeeded
congratulations :)
And as the last line says: congratulations :)
!! Your project is ready to evolve!
Final test to see everything is wired properly
To download resources:
kg download
By default, this will read from download.yaml
and save downloaded data to data/raw
.
To transform downloaded resources:
kg transform
By default, this will run all transforms defined in transform.py
and save results to data/transformed
. Use -s
option with a transform name to run just one, e.g., kg transform -s EnvoTransform
.
To build the merged graph:
kg merge
By default, this will merge all inputs defined in merge.py
and save results to data/merged
. All three commands should work properly. They basically download transform and merge the ENVO
and HP
ontologies and the Anatomical Therapeutic Chemical Classification (ATC
) dataset.
kg-chat (optional package)
The cookiecutter also includes kg-chat
as an optional dependency and all CLI commands run the same.
Requirements
- OpenAI key saved as a local environmental variable
export OPENAI_API_KEY=XXXXX
To install kg-chat
package and its dependencies run:
poetry install -E chat
The first step is to locate the directory containing KGX nodes and edges tsv file (say data/
).
Then, import the nodes and edges file from this data
directory:
kg import --data-dir data
NOTE:
The file names should be namednodes.tsv
andedges.tsv
. (We may make it flexible in the future)
You are all set!!!
To query the graph using the interactive chat tool using:
kg chat --data-dir data
or you could launch an app locally:
kg app --data-dir data
For more information about kg-chat commands, please refer the documentation