Awesome
Kedro Great
As Seen on DataEngineerOne
Watch the Video: Kedro Great: Use Great Expectations with Ease!
Kedro Great is an easy-to-use plugin for kedro that makes integration with Great Expectations fast and simple.
Hold yourself accountable to Great Expectations.
Never have fear of data silently changing ever again.
Quick Start
Install
Kedro Great is available on pypi, and is installed with kedro hooks.
pip install kedro-great
Setup
Once installed, kedro great
becomes available as a kedro command.
You can use kedro great init
to initialize a Great Expectations project, and then automatically generate its project context.
Furthermore, by using kedro great init
, you also generate Great Expectations Datasource
s and Suite
s to use with your catalog.yml
DataSets.
By default, expectation suites are named for the catalog.yml
name and a basic.json
is generated for each.
kedro great init
Use
After the Great Expectations project has been setup and configured, you can now use the KedroGreat
hook to run all your data validations every time the pipeline runs.
# run.py
from kedro_great import KedroGreat
class ProjectContext(KedroContext):
hooks = (
KedroGreat(),
)
Then just run the kedro pipeline to run the suites.
kedro run
Results
Finally, you can use great_expectations
itself to generate documentation and view the results of your pipeline.
Love seeing those green ticks!
great_expectations docs build
Hook Options
The KedroGreat
hook supports a few options currently. If you wish to
expectations_map: Dict[str, Union[str, List[str]]]
If you have multiple expectation suites you wish to run, or expectation suites that do not have the same name
as the catalog dataset, these mappings can be specified in the expectations_map
argument for KedroGreat
Default: The catalog name is the expectation name.
Note: Specifying a suite type such as .basic
will override all other suite types
KedroGreat(expectations_map={
'pandas_iris_data': 'pandas_iris_data',
'spark_iris_data': ['spark_iris_data',
'other_expectation',
'another_expectation.basic'],
})
suite_types: List[Optional[str]]
If your suites have multiple types, you can choose exactly which types to run.
A None
means that a suite will not have the type appended to the name.
Default: The KedroGreat.DEFAULT_SUITE_TYPES
.
Node: If a suite type is already specified in the expectations_map
, that will override this list.
KedroGreat(suite_types=[
'warning',
'basic',
None
])
run_before_node:bool, run_after_node: bool
You can decide when the suites run, before or after a node or both before and after a node.
It will operate on the node inputs
and outputs
respectively.
Default: Only runs before a node runs.
KedroGreat(run_before_node=True, run_after_node=False)
fail_fast: bool, fail_after_pipeline_run: bool
You can also have KedroGreat
throw a SuiteValidationFailure
when a Great Expectations validation fails.
Either the exception can be throw immediately, or the exceptions can be aggregated over the whole pipeline run, and thrown at the end.
This is useful for when you wish to run validation on your pipeline in a CI/CD way.
Default: Neither are set
KedroGreat(fail_fast=True, fail_after_pipeline_run=True)