Home

Awesome

dbt_feature_store

<!--This table of contents is automatically generated. Any manual changes between the ts and te tags will be overridden!--> <!--ts--> <!-- Added by: runner, at: Fri Feb 11 20:52:02 UTC 2022 --> <!--te-->

About

This package contains dbt macros to help you build a feature store right within your dbt repository.

Usage

Inside of dbt Models

NOTE: to see a full example of the package in use, go to dbt_feature_store_example

You can build models with these macros to maintain a feature store updated with your dbt runs.

From fal client (coming soon)

Trigger feature calculations from the fal Python client to quickly iterate and discover the best features for your ML model from your notebook.

Macros

create_dataset (source)

This macro creates a table that holds the label and the historical features. This table should be ready to be used as training data without any additional transformations

Constructor: feature_store.create_dataset(label, features)

Example:

SELECT * 
FROM (

  {{ create_dataset(
      { 
        'table': source('dbt_bike', 'bike_is_winner'), 
        'columns': ['is_winner'] 
      },
      [
        { 
          'table': ref('bike_duration'), 
          'columns': ['trip_duration_last_week', 'trip_count_last_week'] 
        }
      ]
  ) }}

)

latest_timestamp (source)

This macro creates a table with a only latest timestamp rows of a feature, this is useful to make predictions with the latest information available for an entity.

Constructor: feature_store.latest_timestamp(feature)

Building block Macros

next_timestamp (source)

Constructor: feature_store.next_timestamp(entity_column, timestamp_column)

label_feature_join (source)

Constructor: feature_store.label_feature_join(label_entity_column, label_timestamp_column, feature_entity_column, feature_timestamp_column, feature_next_timestamp_column)

feature_table object

A feature_table object is a Python dict with the following properties:

If you pass a ref or source in the table property, you can skip the entity_column and timestamp_column properties, as they will be loaded from the schema.yml meta for models or sources.

version: 2
sources:
  - name: dbt_bike
    tables:
      - name: bike_is_winner
        meta:
          # source example
          fal:
            feature_store:
              entity_column: bike_id
              timestamp_column: date

models:
  - name: bike_duration
    meta:
      # model example
      fal:
        feature_store:
          entity_column: bike_id
          timestamp_column: start_date