Awesome
Intro
dbt-github-workflow is a boilerplate that contains all the necessary configurations to set up a simple CI/CD pipeline for your data modelling stack, making your life simpler by adding back a few extra working hours / out of hours.
I will also add some more resources to this project, feel free to star/follow.
What it does?
Each time you create a new PR or push to the main branch the github continuous integration at .github/workflows/continuous-integration.yml will
- setup python
- configure a cache to speed up future python package installs
- install the python requirements specified in the requirements.txt
- instatiate the dbt profile by copying the project profiles.yml to the github runner home ~/.dbt/ folder
- instantiate the keyfile
- builds the project using
make build <changed.file.list>
- runs pre-commit through on changed files
- runs sqlfluff on the changed files (not in pre-commit, for more readable output)
- authenticates on GCS
- deploys the dbt project to composer's dags/dbt folder
Contribute
While this setup is using
- GCP
- BigQuery
- Airflow in GCP Composer
is not tied to it, you can easily amend it to your needs. Feel free to create a public fork and share back the part of your setup that can be shared.
Setup
- Private fork or copy the content of this repository to your project
- Update the profiles.yml
- Setup the following environment variables
- KEYFILE_DIR - where your keyfile.json will get stored
- GCLOUD_PROJECT - the name of your GCP project
- Copy/Symlink/Instantiate your keyfile.json,
it's content is a service account keyfile
that has at least the following roles in GCP IAM
- BigQuery Data Owner - for dbt so that it can make changes
- BigQuery Job User - for dbt so it can run queries
- BigQuery Read Session User - for dbt so it can create and read sessions
- Storage Object Admin - in order to be able to upload the dbt project content to Airflow's Composer GCS bucket
- Setup the following github secrets:
- SERVICE_ACCOUNT_KEY - contains the actual value of your keyfile.json
- COMPOSER_BUCKET - contains the bucket id as in gs://BUCKET_ID/dags/ which belongs to the DAGs folder of your very environment setup in GCP Composer
- Configure your github project in case you would like to protect the main branch at https://github.com/USER/PROJECT/settings/branches add a new branch protection rule and configure it to your needs