Awesome
dq-vault
This DBT package provides an overview on Data Quality for all DataVault (DV) models in your DBT project.
dq-vault
enables you to:
- Monitor your test coverage for DV models
- Monitor no. of tests are there for each model or each DV entity (Hub, Link, Satellite ...)
- Be aware of tests that fail or throw warnings frequently
- Compare the number of warnings/errors/test failures between each DBT run
What does this package do?
It works by scanning your DBT test run results related to DataVault models and putting them into the following views
& metrics
:
- Views:
dim_dv_test
: information on all your available DBT tests (test name, test tag, which model or column it's assigned to ...)dim_dv_test_execution
: details on each test run (execution time, run id, test name, status, failed rows count, test message ...)fact_dv_tests
: a combined view ofdim_dv_test
anddim_dv_test_execution
fact_dv_tests_unnest_models
:fact_dv_tests
but on the granularity of referenced models in each test
- Metrics:
average_test_execution_time
no_of_test_of_each_model
no_of_test_of_each_run
test_status_of_each_run
What are the requirements to use this package?
A DBT project with:
- DataVault models' names that match
dq_vault__selected_model_rules
below (the rules are configurable) - DBT tests for the mentioned models
Installation
packages:
- package: infinitelambda/dq_vault
version: [">=0.1.0", "<1.0.0"]
# dbt_project.yml
on-run-end:
- '{{ dq_vault.store_test_results_json(results) }}'
Variables
vars:
dq_vault__enable_store_test: true or false
dq_vault__raw_db: 'your_custom_db or target.database'
dq_vault__raw_schema: 'your_custom_schema or target.schema'
dq_vault__selected_model_rules:
- hub: ['hub']
- sat: ['sat','satellite']
- link: ['link','tlink','t_link','lnk','tlnk','t_lnk']
- pit: ['pit']
- bridge: ['bridge']
- xts: ['xts']
In the above:
dq_vault__enable_store_test
: bool- Set as
true
to capture test results on the run-end of dbt command. Defaultfalse
.
- Set as
dq_vault__raw_db
: string- Configure the DATABASE for the raw test log table (
RAW_TEST
)
- Configure the DATABASE for the raw test log table (
dq_vault__raw_schema
: string- Configure the SCHEMA for the raw test log table (
RAW_TEST
)
- Configure the SCHEMA for the raw test log table (
dq_vault__selected_model_rules
: list- Define the mapping for Data Vault model selection based on model names. The order of rules in the list determines the priority of classification.
- For example, with the default rules list ordered as above:
- Model
random_sample_hub.sql
will be classified asHUB
- Model
this_sample_tlink
will be classified asLINK
- Model
sample_SAT_hub.sql
will also be classified asHUB
and notSAT
, because the hub rule ranks 1st in the rules list.
- Model
Classify the Test Type of your test cases
Currently, there are 4 built-in test types based on the test name:
- Duplication: Generic tests' names that contain
unique
- Reconciliation:
- Singular tests
- Generic tests' names that contain the following:
equality
,equal
- Reference: Generic tests' names that contain the following:
reference
,relationship
- Unknown: default test type
Labeling test type for each test case:
- Using test config
models:
- name: my_model
tests:
- my_test:
test_type: duplication
- Using test meta
models:
- name: my_model
tests:
- my_test:
meta:
test_type: duplication
Macros
List of custom built-in macros in this package:
store_test_results_json (source, doc)
get_datavault_type (source, doc)
get_test_type (source, doc)
where_select_dv_models (source, doc)
result_to_dict (source, doc)
get_raw_test (source, doc)
create_resources (source, doc)
refresh_resouces (source, doc)
Integration Tests
The integration_tests
directory contains a DBT project that tests the macros/models/etc in this dq-vault package.
An integration test typically involves making 1- a new seed file; 2- a new model file; 3- a generic test to assert anticipated behavior.
For an example on integration tests, check out the tests for get_datavault_type
macro:
- Macro definition
- Seed or Model file with fake data
- A generic test to assert the macro works as expected
Once you've added all of these files, you should be able to run:
Make sure you are currently in the integration_tests
folder,
dbt deps --target {your_target}
dbt seed --target {your_target}
dbt run --target {your_target} --model {your_model_name}
dbt test --target {your_target} --model {your_model_name}
Note: You might see some Failure
and Warning
while running tests for models specified in integration_tests
.
This is a part of this package testing and is completely normal.
For the current version, running dbt build
inside integration_tests
will yield Completed with 10 errors and 3 warnings
Alternatively, at the repo's root (/dq-vault
):
chmod +x run_test.sh
./run_test.sh {your_target} {your_models}
Once all the tests are passed you're good to go! All tests will be run automatically when you create a PR against this repo.
Developer's Guide
-
Quick Start (if you already setup the local dev, if not see the local dev setting up section below):
- Start the shell
cd /path/to/dq-vault/integration_tests python3 -m poetry shell
- Some sample commands:
# Build model and capture test result - with refresh the resources (raw_tests table) dbt build --exclude source:run_result_log+ tag:failed tag:sample_custom --vars '{dq_vault__enable_store_test: true, fresh: true}' # Build integration test/models with potential failed cases dbt build --select tag:failed tag:sample_custom --vars '{dq_vault__enable_store_test: true}' # Build dq vault main models - downstream of test result log table dbt build --select source:run_result_log+
- Start the shell
-
Prerequisites:
-
Install Python 3.9.6+ as recommended (specified in pyproject.toml)
Assuming your python alias:
python3
Don't need to use an alias if your environment is not multi python version
-
Install
poetry
python3 -m pip install poetry
-
-
Setup dev local environment
- Set working dir
cd /path/to/dq-vault
- Install dependencies
python3 -m poetry install
- Start shell (equivalent to activate virtualenv)
python3 -m poetry shell
- Install dev dependencies
poe git-hooks # Yes, it's poe, it's not a spelling mistake :)
Now you can set up your DBT:
-
Verify dbt installed version
dbt --version
-
Copy profiles to '.dbt' dir (create if not exists) under the Users dir.
# Linux/MacOs mkdir ~/.dbt > /dev/null cp ./profiles/profiles.yml ~/.dbt/profiles.yml
NOTE: To simplify the dev, here we update the real
password
value (not usingenv_vars
) in the profiles.yml after copying -
Check dbt configs:
cd integration_tests dbt debug [--profiles-dir /path/to/profiles-dir]
-
Run your model
dbt deps dbt seed dbt run
To exit the shell:
exit # Enter