Awesome
Celo ETL
Overview
Celo ETL allows you to setup an ETL pipeline in Google Cloud Platform for ingesting Celo blockchain data into BigQuery and Pub/Sub. It comes with CLI tools for exporting Celo data into convenient formats like CSVs and relational databases.
Architecture
-
The nodes are run in a Kubernetes cluster.
-
Airflow DAGs export and load Celo data to BigQuery daily. Refer to Celo ETL Airflow for deployment instructions.
-
Celo data is polled periodically from the nodes and pushed to Google Pub/Sub. Refer to Celo ETL Streaming for deployment instructions.
-
Celo data is pulled from Pub/Sub, transformed and streamed to BigQuery. Refer to Celo ETL Dataflow for deployment instructions.
Setting Up
-
Follow the instructions in Celo ETL Airflow to deploy a Cloud Composer cluster for exporting and loading historical Celo data. It may take several days for the export DAG to catch up. During this time "load" and "verify_streaming" DAGs will fail.
-
Follow the instructions in Celo ETL Streaming to deploy the Streamer component. For the value in
last_synced_block.txt
specify the last block number of the previous day. You can query it in BigQuery:SELECT number FROM crypto_celo.blocks ORDER BY number DESC LIMIT 1
. -
Follow the instructions in Celo ETL Dataflow to deploy the Dataflow component. Monitor "verify_streaming" DAG in Airflow console, once the Dataflow job catches up the latest block, the DAG will succeed.