Awesome
preprocessd
Simple example showing how to use Cloud Run to pre-process events before persisting them to the backing store (e.g. BigQuery). This is a common use-case where the raw data (e.g. submitted through REST API) needs to be pre-processed (e.g. decorated with additional attributed, classified, or simply validated) before saving.
Cloud Run is a great platform to build these kind of ingestion or pre-processing services:
- Write each one of the pre-processing steps in the most appropriate (or favorite) development language
- Bring your own runtime (or even specific version of that runtime) along with custom libraries
- Dynamically scale up and down with your PubSub event load
- Scale to 0, and don't pay anything, when there is nothing to process
- Use granular access control with service account and policy bindings
Event Source
In this example will will use the synthetic events on PubSub topic generated by pubsub-event-maker utility. We will use it to mock synthetic utilization
data from 3
devices and publish them to Cloud PubSub on the eventmaker
topic in your project. The PubSub payload looks something like this:
{
"source_id": "device-1",
"event_id": "eid-b6569857-232c-4e6f-bd51-cda4e81f3e1f",
"event_ts": "2019-06-05T11:39:50.403778Z",
"label": "utilization",
"mem_used": 34.47265625,
"cpu_used": 6.5,
"load_1": 1.55,
"load_5": 2.25,
"load_15": 2.49,
"random_metric": 94.05090880450125
}
The instructions on how to configure pubsub-event-maker
to start sending these events are here.
Pre-requirements
GCP Project and gcloud SDK
If you don't have one already, start by creating new project and configuring Google Cloud SDK. Similarly, if you have not done so already, you will have set up Cloud Run.
Setup
Build Container Image
Cloud Run runs container images. To build one we are going to use the included Dockerfile and submit the build job to Cloud Build using bin/image script.
Note, you should review each one of the provided scripts for complete content of these commands
bin/image
If this is first time you use the build service you may be prompted to enable the build API
Service Account and IAM Policies
In this example we are going to follow the principle of least privilege (POLP) to ensure our Cloud Run service has only the necessary rights and nothing more:
run.invoker
- required to execute Cloud Run servicepubsub.editor
- required to create and publish to Cloud PubSublogging.logWriter
- required for Stackdriver loggingcloudtrace.agent
- required for Stackdriver tracingmonitoring.metricWriter
- required to write custom metrics to Stackdriver
To do that we will create a GCP service account and assign the necessary IAM policies and roles using bin/account script:
bin/account
Cloud Run Service
Once you have configured the GCP accounts, you can deploy a new Cloud Run service and set it to run under that account using and preventing unauthenticated access bin/service script:
bin/service
PubSub Subscription
To enable PubSub to send topic data to Cloud Run service we will need to create a PubSub topic subscription and configure it to "push" events to the Cloud Service we deployed above.
bin/pubsub
Log
You can see the raw data and all the application log entries made by the service in Cloud Run service logs.
<img src="images/log.png" alt="Cloud Run Log">Saving Results
The process of saving resulting data from this service will depend on your target (the place where you want to save the data). HCP has a number of existing connectors and templates so, in most cases, you do not have to even write any code. Here is an example of a Dataflow template that streams PubSub topic data to BigQuery:
gcloud dataflow jobs run JOB_NAME \
--gcs-location gs://dataflow-templates/latest/PubSub_to_BigQuery \
--parameters \
inputTopic=projects/YOUR_PROJECT_ID/topics/YOUR_TOPIC_NAME,\
outputTableSpec=YOUR_PROJECT_ID:YOUR_DATASET.YOUR_TABLE_NAME
This approach will automatically deal with back-pressure, retries, monitoring and is not subject to the batch insert quote limits.
Cleanup
To cleanup all resources created by this sample execute the bin/cleanup script.
bin/cleanup
Disclaimer
This is my personal project and it does not represent my employer. I take no responsibility for issues caused by this code. I do my best to ensure that everything works, but if something goes wrong, my apologies is all you will get.