Awesome
This is a very light demonstration of how the gusty package works with Airflow to assist in the organization, construction, and management of DAGs, tasks, dependencies, and operators. It requires that you have Docker and Docker Compose installed on your machine.
TL;DR
gusty takes YAML specifications of individual tasks and converts those specs into full Airflow DAGs. gusty includes full support for Airflow DAGs, task groups, tasks, dependencies, external dependencies, and more.
If don't have time to run the demo, please check out the "Why You Should Try gusty" section below.
Lastly, here is how gusty renders the more_gusty DAG:
Running the demo
Up and Running
- Clone this repository to your local machine
- In your terminal, while in the
gusty-demo-lite
directory, rundocker-compose build
- Once the build is done, run
docker-compose up
Once you see this:
| ____ |__( )_________ __/__ /________ __
| ____ /| |_ /__ ___/_ /_ __ /_ __ \_ | /| / /
| ___ ___ | / _ / _ __/ _ / / /_/ /_ |/ |/ /
| _/_/ |_/_/ /_/ /_/ /_/ \____/____/|__/
You are good to go check out Airflow at localhost:8080
in your browser! You can log in with the username gusty
and the password demo
.
Security Note
Please note this demo is not safe, as usernames, passwords, and keys are stored in plain text in the docker-compose.yml
file. In general, you should store these sensitive items in your environments. For a more secure demonstration of gusty with Airflow, please go to the full-sized gusty demo.
A Bigger Demo is Available
If you have tried this demo, and think gusty is cool but you don't see a reason to use it yet, please check out the full-sized gusty demo, which provides proofs of concept for how gusty helps enable:
- Auto-detecting and the automatic setting of dependencies in SQL-related tasks (see: MaterializedPostgresOperator)
- Running tasks that are Jupyter Notebooks (see: JupyterOperator)
- Running tasks that are RMarkdown documents (see: RmdOperator)
Note the bigger demo takes a while longer to build, which is why we made a light demo here.
Why You Should Try gusty
Below are all of the current gusty features, as described in the more_gusty
DAG docs:
Everything can be specified as YAML (or other file formats)
- DAGs and task groups can use a file titled
METADATA.yml
to specify parameter available to either a DAG or a task group. Anything specified inMETADATA.yml
will override defaults set in gusty'screate_dag
function. - Tasks use the
operator
parameter to specify which operator gusty should use, and then any other parameter that can be specified to that operator can be added to the YAML. - For each task, dependencies within the same DAG can listed under
dependencies
in the task YAML. - External dependencies for dependencies located outside the same DAG can be listed using the format
dag_id: task_id
, ordag_id: all
to depend on an entire other DAG. - All DAGs, task groups, and tasks are named after their folder or file names.
- gusty also accepts YAML front matter in
.py
,.ipynb
, and.Rmd
files.
Defaults can be specified in create_dag
- gusty's
create_dag
function accepts any keyword argument that can be passed to Airflow's DAG class, so you can create a DAG without having to useMETADATA.yml
. - gusty's
create_dag
function also accepts a dictionary of task group parameters undertask_group_defaults
. - When you specify an external dependency, gusty creates an ExternalTaskSensor, whose parameters can be adjusted under the
wait_for_defaults
argument ofcreate_dag
.
DAG-level features (which can be placed either in create_dag
or a METADATA.yml
file)
latest_only
- A boolean that will tell gusty to ensure the entire DAG does not run tasks during catchup runs. This is enabled by default.external_dependencies
- Specify external dependencies on the DAG level, using the same format described above. Note that if you specify external dependencies in a call tocreate_dag
, you use the format[{'dag_id': 'task_id'}]
.root_tasks
- Specify task ids that should be placed at the root of your DAG, like an S3 sensor.leaf_tasks
- Specify task ids that should be placed at the end of your DAG, like a task that generates a report.ignore_subfolders
- Disable the creation of Task Groups from DAG directory subfolders.
Task Group features
- To create a task group, all you have to do is put some YAML specifications in a subdirectory of the DAG's directory.
suffix_group_id
- In addition toprefix_group_id
, which is an Airflow task group option for adding the task group id to the front of your task id, you can suffix instead.prefix_group_id
is set toFalse
by default, because task names should be explicitly set unless you specify otherwise.
Note shown here but also very useful
- gusty supports custom operators, using the
local
keyword when specifying an operator, e.g.operator: local.your_custom_operator_here
. gusty will look for these operators in anoperators
directory within yourAIRFLOW_HOME
. - gusty will also pick up dependencies you specify in your operator, so you can auto-detect dependencies in a SQL query and pass them along, then gusty will set these dependencies
- gusty also carries a
file_path
attribute, which you can use to, for example, render a Jupyter Notebook