Awesome
dbt-athena
- Supports dbt version
1.0.*
- Supports Seeds
- Correctly detects views and their columns
- Support incremental models
- Support two incremental update strategies:
insert_overwrite
andappend
- Does not support the use of
unique_key
- Support two incremental update strategies:
- Only supports Athena engine 2
Installation
pip install dbt-athena-adapter
- Or
pip install git+https://github.com/Tomme/dbt-athena.git
Prerequisites
To start, you will need an S3 bucket, for instance my-staging-bucket
and an Athena database:
CREATE DATABASE IF NOT EXISTS analytics_dev
COMMENT 'Analytics models generated by dbt (development)'
LOCATION 's3://my-staging-bucket/'
WITH DBPROPERTIES ('creator'='Foo Bar', 'email'='foo@bar.com');
Notes:
- Take note of your AWS region code (e.g.
us-west-2
oreu-west-2
, etc.). - You can also use AWS Glue to create and manage Athena databases.
Credentials
This plugin does not accept any credentials directly. Instead, credentials are determined automatically based on aws cli
/boto3
conventions and
stored login info. You can configure the AWS profile name to use via aws_profile_name
. Checkout DBT profile configuration below for details.
Configuring your profile
A dbt profile can be configured to run against AWS Athena using the following configuration:
Option | Description | Required? | Example |
---|---|---|---|
s3_staging_dir | S3 location to store Athena query results and metadata | Required | s3://bucket/dbt/ |
region_name | AWS region of your Athena instance | Required | eu-west-1 |
schema | Specify the schema (Athena database) to build models into (lowercase only) | Required | dbt |
database | Specify the database (Data catalog) to build models into (lowercase only) | Required | awsdatacatalog |
poll_interval | Interval in seconds to use for polling the status of query results in Athena | Optional | 5 |
aws_profile_name | Profile to use from your AWS shared credentials file. | Optional | my-profile |
work_group | Identifier of Athena workgroup | Optional | my-custom-workgroup |
num_retries | Number of times to retry a failing query | Optional | 3 |
Example profiles.yml entry:
athena:
target: dev
outputs:
dev:
type: athena
s3_staging_dir: s3://athena-query-results/dbt/
region_name: eu-west-1
schema: dbt
database: awsdatacatalog
aws_profile_name: my-profile
work_group: my-workgroup
Additional information
threads
is supporteddatabase
andcatalog
can be used interchangeably
Usage notes
Models
Table Configuration
external_location
(default=none
)- The location where Athena saves your table in Amazon S3
- If
none
then it will default to{s3_staging_dir}/tables
- If you are using a static value, when your table/partition is recreated underlying data will be cleaned up and overwritten by new data
partitioned_by
(default=none
)- An array list of columns by which the table will be partitioned
- Limited to creation of 100 partitions (currently)
bucketed_by
(default=none
)- An array list of columns to bucket data
bucket_count
(default=none
)- The number of buckets for bucketing your data
format
(default='parquet'
)- The data format for the table
- Supports
ORC
,PARQUET
,AVRO
,JSON
, orTEXTFILE
write_compression
(default=none
)- The compression type to use for any storage format that allows compression to be specified. To see which options are available, check out CREATE TABLE AS
field_delimiter
(default=none
)- Custom field delimiter, for when format is set to
TEXTFILE
- Custom field delimiter, for when format is set to
More information: CREATE TABLE AS
Supported functionality
Support for incremental models:
- Support two incremental update strategies with partitioned tables:
insert_overwrite
andappend
- Does not support the use of
unique_key
Due to the nature of AWS Athena, not all core dbt functionality is supported. The following features of dbt are not implemented on Athena:
- Snapshots
Known issues
-
Quoting is not currently supported
- If you need to quote your sources, escape the quote characters in your source definitions:
version: 2 sources: - name: my_source tables: - name: first_table identifier: "first table" # Not like that - name: second_table identifier: "\"second table\"" # Like this
-
Tables, schemas and database should only be lowercase
-
Only supports Athena engine 2
Running tests
First, install the adapter and its dependencies using make
(see Makefile):
make install_deps
Next, configure the environment variables in dev.env to match your Athena development environment. Finally, run the tests using make
:
make run_tests