Home

Awesome

DataHub-Tools

A python quasi-client (and tools) that interacts with DataHub GraphQL endpoints.

Install

Requires python >=3.9

pip install git+https://github.com/makenotion/datahub-tools

Three environment variables are required:

Quick-Start

Make sure you have your environment variables set (see above)

Sample some of your DataHub entities

from datahub_tools.client import get_datahub_entities
from datahub_tools.classes import DHEntity

entities: list[DHEntity] = get_datahub_entities(limit=5)

Take the first entity and apply some changes

from datahub_tools.client import update_dataset_description, emit_metadata
entity = entities[0]

# update the table's description
update_dataset_description(
    resource_urn=entity.urn,
    description="The new description for this entity"
)

# update the description for a column within the table
update_field_descriptions(
    resource_urn=entity.urn,
    field_descriptions={"my_column": "A new description for this column"}
)

# set the custom properties
emit_metadata(metadata={'foo': 'bar'}, resource_urn=entity.urn)

Discussion

This package was not created with the intent of open-sourcing (rather just for internal use at Notion) and so it does not contain wrappers around everything endpoint. There is a lot missing before reaching feature parity but hopefully still helpful.

The goals of the functions were to provide simple and easy programmatic access to common DataHub operations

Features

Users are encouraged to setup logging in advance as many steps and communications are logged to INFO.

Development/Contributing

Contributions are welcome, but please add tests for new code. Testing is setup using pytest and orchestrated with tox:

pyenv shell 3.9.16 3.10.9 3.11.1
pip install -U pip setuptools tox
tox
pyenv shell -

Set up the pre-commit hooks

brew install pre-commit
pre-commit install

The pre-commit hooks will run automatically upon a commit

Contact/Author

Contact Theo Wou <theodore[at]makenotion.com> Originally authored by Ada Draginda