Home

Awesome

terragrunt-icon-analytics

This is a reference architecture to deploy the analytics platform for use with the ICON-ETL exporter in Airflow.

To perform analytics on ICON, you will first need the infrastructure deployed with this repo. To create new analytics pipelines, you will need to modify the airflow dags repo. Best practice would be to fork that repo, ssh into the airflow machine, and replace the dags folder (~/airflow/dags) with your fork.

A supplemental deployment guide can be found here.

Deploying the Stack

The process involves four steps:

  1. Setting up accounts, projects, and API keys on your cloud provider.
  2. Configure the necessary files and ssh keys.
  3. Run the deployment
  4. Configure applications (Superset and Airflow)

After these steps, you will then be able to run the Airflow DAGs linked above to build data pipelines

Account Setup

Before running on any cloud, signup and provide payment details to create an active account and project. You will need API keys to any provider that you intend on running on. For a walkthrough on each provider, please check the following links for setting up your cloud accounts.

Deployment Setup

A sample secrets file is provided in the repository for your customization. Make a copy of the secrets.yml.example and name secrets.yml. Edit the file to your needs. You are able to run a dev and prod setup.

You will also need ssh keys. If you put a password on the SSH key, you will be prompted for it on deployment.

# Generate ssh key 
ssh-keygen -b 4096

Take note of the path and insert it into secrets.yml file for the public_key_path (*.pub) and private_key_path.

Prerequisites

To run all the different tools, you will need the following tools:

  1. terraform - version 0.13+
  2. terragrunt - version 0.25.x (not newest 0.26)
  3. ansible (Not supported on windows without WSL)

Running Deployment

Preflight

Before running the deployment, ensure that your cloud credentials are accessible to Terragrunt, either as environment variables, or some other utility (like aws-vault for AWS). You will also may want to modify various settings like DB / instance sizes before deploying. To do this, in the icon-analytics/aws/variables.tf file, modify the fields in the environment (prod by default) that you need.

Deploying

You will need to use Terragrunt to deploy modules in a specific order using the command terragrunt apply --terragrunt-working-dir /path/to/each/module.

The order you apply modules is important, with some modules being required and others being optional.

For example, if you were to clone a local deployment to /Users/example/analytics/, you would need to deploy the network module with terragrunt apply --terragrunt-working-dir /Users/example/analytics/icon-analytics/aws/network/

Additional DB settings can be applied with postgres and redshift-config directories.

Destroying Deployment

To destroy the deployment, simply run terragrunt destroy --terragrunt-working-dir /path/to/each/module in the reverse order to your deployment. Make sure that you have activated the appropriate cloud credentials. Also note that you are running terragrunt DESTROY and not terragrunt APPLY, as applying the module again would not do anything unless your source has changed.

Running Multiple Environments

To run multiple environments, put a new key in the secrets.yml file like the prod section. Then in the icon-analytics/aws/variables.tf file, change the env field to your environment. You will also find environment specific options there.

Configuring Applications

After deploying the applications, navigate to the UI and follow these steps.

Airflow

After deploying Airflow, navigate to airflow.yourdomain.com and login with the credentials from your secrets.yaml. You will need to set some variables in order to run the DAGs.

Superset

After deploying superset, navigate to superset.yourdomain.com and login with the credentials from your secrets.yaml. You will need to first connect to your backends.

You can either populate these connection values manually into superset or import them via a config file provided. To import it via file, make a copy of the ./superset/sources/database_sources.yaml.example and remove the .example suffix. Then you will need to change the values in line 6 for the sqlalchemy_uri to reflect the outputs of the corresponding backend (this_db_instance_address) and username / password from the secrets.yaml.

To import charts, import them from the superset/charts directory.

Relevant Repos

Analytics

Deployment

Development

For development: