Home

Awesome

DBT Column Lineage Extractor

DISCLAIMER

WARNING: This tool is currently in beta and has only been tested on a limited number of dbt projects using the snowflake dialect. It might not perform as expected in every situation. Please report any issues or suggestions in the Repository

Overview

The DBT Column Lineage Extractor is a lightweight Python-based tool for extracting and analyzing data column lineage for dbt projects. This tool utilizes the sqlglot library to parse and analyze SQL queries defined in your dbt models and maps their column lineage relationships.

GitHub Repository

dbt Column Lineage Extractor

Features

Installation

pip installation

pip install dbt-column-lineage-extractor==0.1.4b1

Required Input Files

To run the DBT Column Lineage Extractor, you need the following files:

These files are generated by executing the command:

dbt docs generate

Important Notes

Example Usage and Customization

The DBT Column Lineage Extractor can be used in two ways: via the command line interface or by integrating the Python scripts into your codebase.

cd examples

Option 1 - Command Line Interface

First, generate column lineage relationships to model's direct parents and children using the dbt_column_lineage_direct command, e.g.:

dbt_column_lineage_direct --manifest ./inputs/manifest.json --catalog ./inputs/catalog.json

Then analyze recursive column lineage relationships for a specific model and column using the dbt_column_lineage_recursive command, e.g.:

dbt_column_lineage_recursive --model model.jaffle_shop.stg_orders --column order_id

See more usage guides using dbt_column_lineage_direct -h and dbt_column_lineage_recursive -h.

Option 2 - Python Scripts

See the readme file in the examples directory for more detailed instructions on how to integrate the DBT Column Lineage Extractor into your python scripts.

Example Outputs

Example Visualization

The structured JSON outputs can be used programmatically, or loaded into visualization tools like jsoncrack.com to visualize the column lineage relationships and dependencies. visualize

Limitations