Awesome
dbt_feature_store
<!--This table of contents is automatically generated. Any manual changes between the ts and te tags will be overridden!--> <!--ts-->- dbt_feature_store
- About
- Usage
- Macros * create_dataset (<a href="/macros/create_dataset.sql">source</a>) * latest_timestamp (<a href="/macros/latest_timestamp.sql">source</a>)
- feature_table object
About
This package contains dbt macros to help you build a feature store right within your dbt repository.
Usage
Inside of dbt Models
NOTE: to see a full example of the package in use, go to dbt_feature_store_example
You can build models with these macros to maintain a feature store updated with your dbt runs.
From fal client (coming soon)
Trigger feature calculations from the fal Python client to quickly iterate and discover the best features for your ML model from your notebook.
Macros
create_dataset (source)
This macro creates a table that holds the label and the historical features. This table should be ready to be used as training data without any additional transformations
Constructor: feature_store.create_dataset(label, features)
label
: feature_table objectfeatures
: list of feature_table objects
Example:
SELECT *
FROM (
{{ create_dataset(
{
'table': source('dbt_bike', 'bike_is_winner'),
'columns': ['is_winner']
},
[
{
'table': ref('bike_duration'),
'columns': ['trip_duration_last_week', 'trip_count_last_week']
}
]
) }}
)
latest_timestamp (source)
This macro creates a table with a only latest timestamp rows of a feature, this is useful to make predictions with the latest information available for an entity.
Constructor: feature_store.latest_timestamp(feature)
feature
: feature_table object
Building block Macros
next_timestamp (source)
Constructor: feature_store.next_timestamp(entity_column, timestamp_column)
entity_column
: column name of id of rows for joining a label tables and feature tablestimestmap_column
: column name of timestamp/date of rows for joining a label tables and feature tables
label_feature_join (source)
Constructor: feature_store.label_feature_join(label_entity_column, label_timestamp_column, feature_entity_column, feature_timestamp_column, feature_next_timestamp_column)
label_entity_column
: column name of the entity id that is used for predictions, this column is used to join labels to featureslabel_timestamp_column
: column name of the timestamp/date, this column is used to join labels to featuresfeature_entity_column
: column name of the entity id that is used for predictions, this column is used to join labels to featuresfeature_timestamp_column
: column name of the timestamp/date, this column is used to join labels to featuresfeature_next_timestamp_column
: column pre-calculated (normally in a CTE) with the call of the macro feature_store.next_timestamp(feature_entity_column, feature_timestamp_column)
feature_table object
A feature_table object is a Python dict with the following properties:
table
: aref
,source
or name of a CTE defined in the querycolumns
: a list of columns from the label relation to appear in the final queryentity_column
(optional): column name of the entity id that is used for predictions, this column is used to join labels to featurestimestmap_column
(optional): column name of the timestamp/date, this column is used to join labels to features
If you pass a ref or source in the table
property, you can skip the entity_column
and timestamp_column
properties, as they will be loaded from the schema.yml meta
for models or sources.
version: 2
sources:
- name: dbt_bike
tables:
- name: bike_is_winner
meta:
# source example
fal:
feature_store:
entity_column: bike_id
timestamp_column: date
models:
- name: bike_duration
meta:
# model example
fal:
feature_store:
entity_column: bike_id
timestamp_column: start_date