Home

Awesome

icon-etl-airflow

Contents

<!-- START doctoc generated TOC please keep comment here to allow auto update --> <!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE --> <!-- END doctoc generated TOC please keep comment here to allow auto update -->

Usage

To use with terragrunt-icon-analytics:

sudo cp ./dags /etc/airflow/dags
sudo chown -R airflow:airflow /etc/airflow/dags
sudo su - airflow
pip3 install -r requirements.txt
exit
sudo systemctl restart airflow-scheduler
sudo systemctl restart airflow-webserver

Variables

Project-wide

VariableSuggested DefaultDescription
icon_cloud_providerawsThe cloud provider you're using for Airflow: either 'aws' or 'gcp'
icon_databasepostgresThe connection name for the database you're inserting data into
icon_output_bucketThe name of the output bucket for block data
icon_provider_urishttps://ctz.solidwallet.io/api/v3The URI of the endpoint to use for block data
notification_emailsList of email addresses to notify about job status

Export

VariableDefaultDescription
icon_export_start_date2018-01-24The first date to start exporting
icon_export_schedule_interval0 0 * * *The scheduled time for when the export job should run
icon_export_max_active_runsNoneThe maximum number of active export runs
icon_export_max_workers10The maximum number of export workers per run
icon_export_batch_size10The export batch size
icon_export_retries5The number of retries per export run
icon_export_blocks_and_transactions_toggleTrueEnable exporting of blocks and transactions
icon_export_receipts_and_logs_toggleTrueEnable exporting of receipts and logs

Load

VariableDefaultDescription
aws_access_key_idAccess key ID to use for AWS Redshift
aws_secret_access_keySecret access key to use for AWS Redshift
load_all_partitionstrueLoad all partitions in BigQuery
destination_dataset_project_idGCP Project ID for the BigQuery dataset

Schemas

Further information about the data used in the schemas is available in the ICON JSON RPC documentation.

Primary keys for each table are indicated in bold.

Caution should be exercised when using the timestamp field directly from the API. The time resolution of the epoch timestamp changes throughout. For ease of use, it is suggested to perform a transformation on this column in all tables to use only the left 16 digits (corresponding to the epoch timestamp in seconds).

A suggested transformation query is:

select left(timestamp::text, 10)::int8 as timestamp_s

and can be used to create a transformed column in a new table or materialized view.

Blocks

FieldType
numberbigint
hashstring
parent_hashstring
merkle_root_hashstring
timestampbigint
versionstring
peer_idstring
signaturestring
next_leaderstring

Transactions

FieldType
versionstring
from_addressstring
to_addressstring
valuenumeric(38,0)
step_limitnumeric(38,0)
timestampbigint
nidint
noncenumeric(100,0)
hashstring
transaction_indexbigint
block_hashstring
block_numberbigint
feenumeric(38,0)
signaturestring
data_typestring
datastring

Logs

FieldType
log_indexint
transaction_hashstring
transaction_indexint
block_hashstring
block_numberint
addressstring
datastring
indexedstring

Receipts

FieldType
transaction_hashstring
transaction_indexint
block_hashstring
block_numberint
cumulative_step_usednumeric(38,0)
step_usednumeric(38,0)
step_pricenumeric(38,0)
score_addressstring
statusstring

Credits

ICON ETL Airflow was written by Richard Mah (@shinyfoil) for Insight Infrastructure, and was based on Ethereum ETL Airflow. The development was funded by the ICON Foundation.