Awesome
dbt-testgen
About
dbt-testgen
is a dbt package that autogenerates dbt test YAML based on real data.
Code documentation available at here
Inspired by dbt-codegen and deequ Constraint Suggestion.
Install
dbt-testgen
currently supports dbt 1.2.x
or higher.
Include in packages.yml
:
packages:
- git: https://github.com/kgmcquate/dbt-testgen
Supported Databases
The following databases are supported:
- Snowflake
- Databricks
- RedShift
- BigQuery
- Postgres
- DuckDB
Integration tests are run for each of these databases in Actions.
Test types
dbt-testgen can generate these types of tests, using built-in tests, dbt_utils, and dbt-expectations:
Usage
The DBT config YAML is generated by a Jinja macro, get_test_suggestions
, which you can run like this:
dbt compile -q --inline "{{ testgen.get_test_suggestions(ref('mymodel')) }}"
Output:
models:
- name: mymodel
tests:
- dbt_utils.recency:
field: day
datepart: day
interval: 2
columns:
- name: user_id
description: Numeric range test generated by dbt-testgen
tests:
- unique
- not_null
- dbt_utils.accepted_range:
min_value: 1
max_value: 30
- name: username
tests:
- unique
- not_null
- dbt_expectations.expect_column_value_lengths_to_be_between:
min_value: 8
max_value: 15
row_condition: '"username" is not null'
- name: email
tests:
- unique
- not_null
- dbt_expectations.expect_column_value_lengths_to_be_between:
min_value: 18
max_value: 25
row_condition: '"email" is not null'
- name: user_status
tests:
- accepted_values:
values:
- active
- inactive
- dbt_expectations.expect_column_value_lengths_to_be_between:
min_value: 6
max_value: 8
row_condition: '"user_status" is not null'
- name: age
tests:
- dbt_utils.accepted_range:
min_value: 22
max_value: 35
<hr>
<br>
You can output to a file like this:
dbt compile -q --inline "{{ testgen.get_test_suggestions(ref('mymodel')) }}" >> models/schema.yml
<hr>
<br>
You can also merge with an existing properties YAML file:
EXISTING_YAML_BODY=`cat models/schema.yml`
dbt compile -q --inline "{{ testgen.get_test_suggestions(ref('users'), dbt_config=fromyaml(\"${EXISTING_YAML_BODY}\")) }}"
<hr>
<br>
Here's an example of more advanced usage:
EXISTING_YAML_BODY=$(cat <<EOF
models:
- name: stg_jaffle_shop__customers
config:
tags:
- pii
columns:
- name: customer_id
tests:
- unique
- not_null
seeds:
- name: stg_jaffle_shop__orders
config:
materialized: view
columns:
- name: order_id
tests:
- unique
- not_null
EOF
)
JINJA_TEMPLATE=$(cat <<EOF
{{ testgen.get_test_suggestions(
ref('users'),
sample = true,
limit = 1000,
resource_type = 'seeds',
column_config = {'quote': true, 'tags': ['generated']},
exclude_types = ['float'],
tests = ['uniqueness', 'accepted_values', 'range'],
uniqueness_composite_key_length = 2,
accepted_values_max_cardinality = 10,
range_stddevs = 1,
dbt_config=fromyaml("${EXISTING_YAML_BODY}")
) }}
EOF
)
dbt compile -q --inline "${JINJA_TEMPLATE}"
Output:
models:
- name: stg_jaffle_shop__customers
config:
tags:
- pii
columns:
- name: customer_id
tests:
- unique
- not_null
seeds:
- name: stg_jaffle_shop__orders
columns:
- name: order_id
tests:
- unique
- not_null
- name: users
columns:
- name: user_id
tests:
- unique
- not_null
- dbt_utils.accepted_range:
min_value: -3.4017042154147523
max_value: 34.40170421541475
quote: true
tags:
- generated
- name: username
tests:
- unique
- not_null
quote: true
tags:
- generated
- name: email
tests:
- unique
- not_null
quote: true
tags:
- generated
- name: user_status
tests:
- accepted_values:
values:
- active
- inactive
quote: true
tags:
- generated
- name: age
tests:
- dbt_utils.accepted_range:
min_value: 20.462630250354053
max_value: 36.53736974964595
quote: true
tags:
- generated