Home

Awesome

Notice

To better serve Wise business and customer needs, the PipelineWise codebase needs to shrink. We have made the difficult decision that, going forward many components of PipelineWise will be removed or incorporated in the main repo. The last version before this decision is v0.64.1

We thank all in the open-source community, that over the past 6 years, have helped to make PipelineWise a robust product for heterogeneous replication of many many Terabytes, daily

pipelinewise-tap-google-analytics

PyPI version PyPI - Python Version License: MIT

This is a Singer tap that reads data from Google Analytics and produces JSON-formatted data following the Singer spec.

This is a PipelineWise compatible target connector.

How to use it

The recommended method of running this tap is to use it from PipelineWise. When running it from PipelineWise you don't need to configure this tap with JSON files and most of things are automated. Please check the related documentation at Google Analytics

If you want to run this Singer Tap independently please read further.

Usage

This tap is a fork of Meltano's Google Analytics tap that:

As the Google Analytics Reports are defined dynamically and there are practically infinite combinations of dimensions and metrics a user can ask for, the entities and their schema (i.e. the Catalog for this tap) are not static. So, this tap behaves more or less similarly to a tap extracting data from a Data Source (e.g. a Postgres Database).

The difference of tap-google-analytics to a database tap is that the Catalog (available entities/streams and their schema) is dynamic but not available to be discovered at run time by connecting to the Data Source. It must be dynamically generated based on the reports the user wants to generate by connecting to the Google Analytics Reporting API.

To that end, this tap uses an additional JSON file for the definition of the reports that the user wants to be generated. You can check, as an example, the JSON file used as a default in tap-google-analytics/defaults/default_report_definition.json. Those report definitions could be part of the config.json, but we prefer to keep config.json small and clean and provide the definitions by using an additional file.

Based on the report(s) definition, it generates a valid Catalog that follows the Singer spec.

It then behaves as any Singer compatible tap and uses that Catalog (or any Catalog generated by a tap-google-analytics) to generate the requested reports. The additional JSON file for defining the reports is only required for generating an initial Catalog.

When no report definitions are provided by the user, tap-google-analytics generates a default Catalog with some common reports provided:

Install

First, make sure Python 3 is installed on your system or follow these installation instructions for Mac or Ubuntu.

It's recommended to use a virtualenv:

  python3 -m venv venv
  pip install pipelinewise-tap-google-analytics

or

  python3 -m venv venv
  . venv/bin/activate
  pip install --upgrade pip
  pip install .

Authorization Methods

tap-google-analytics supports two different ways of authorization:

If you're setting up tap-google-analytics for your own organization and only plan to extract from a handful of different views in the same limited set of properties, Service Account based authorization is the simplest. When you create a service account Google gives you a json file with that service account's credentials called the client_secrets.json, and that's all you need to pass to this tap, and you only have to do it once, so this is the recommended way of configuring tap-google-analytics.

If you're building something where a wide variety of users need to be able to give access to their Google Analytics, tap-google-analytics can use an access_token granted by those users to authorize it's requests to Google. This access_token is produced by a normal Google OAuth flow, but this flow is outside the scope of tap-google-analytics. This is useful if you're integrating tap-google-analytics with another system, like Stitch Data might do to allow users to configure their extracts themselves without manual config setup. This tap expects an access_token, refresh_token, client_id and client_secret to be passed to it in order to authenticate as the user who granted the token and then access their data.

Required Analytics Reporting APIs & OAuth Scopes

In order for tap-google-analytics to access your Google Analytics Account, it needs the Analytics Reporting API and the Analytics API (which are two different things) enabled. If using a service account to authorize, these need to be enabled for a project inside the same organization as your Google Analytics account (see below), or if using an OAuth credential set, they need to be enabled for the project the OAuth client ID and secret come from.

If using the OAuth authorization method, the OAuth flow conducted elsewhere must request at minimum the analytics.readonly OAuth scope to get an access_token authorized to hit these APIs

Creating service account credentials

If you have already have a valid client_secrets.json for a service account, or if you are using OAuth based authorization, you can skip the rest of this section.

As a first step, you need to create or use an existing project in the Google Developers Console:

  1. Sign in to the Google Account you are using for managing Google Analytics (you must have Manage Users permission at the account, property, or view level).

  2. Open the Service accounts page. If prompted, select a project or create a new one to use for accessing Google Analytics.

  3. Click Create service account.

    In the Create service account window, type a name for the service account, and select Furnish a new private key. Then click Save and store it locally as client_secrets.json.

    If you already have a service account, you can generate a key by selecting 'Edit' for the account and then selecting the option to generate a key.

Your new public/private key pair is generated and downloaded to your machine; it serves as the only copy of this key. You are responsible for storing it securely.

Add service account to the Google Analytics account

The newly created service account will have an email address that looks similar to:

quickstart@PROJECT-ID.iam.gserviceaccount.com

Use this email address to add a user to the Google analytics view you want to access via the API. For using tap-google-analytics only Read & Analyze permissions are needed.

Enable the APIs

  1. Visit the Google Analytics Reporting API dashboard and make sure that the project you used in the Create credentials step is selected.

From this dashboard, you can enable/disable the API for your account, set Quotas and check usage stats for the service account you are using with tap-google-analytics.

  1. Visit the Google Analytics API dashboard, make sure that the project you used in the Create credentials step is selected and enable the API for your account.

Configuration Settings

A sample config for tap-google-analytics might look like this:

sample_config.json

{
  "key_file_location": "client_secrets.json",  // can also use `oauth_credentials`, see below
  "view_id": "123456789",
  "reports": "reports.json",
  "start_date": "2019-05-01T00:00:00Z",
  "end_date": "2019-06-01T00:00:00Z"
}

Required configuration parameters:

Optional parameters:

{
  "oauth_credentials": {
      "access_token": "<ya29.GlxtB_access_token_gobbledegook>",
      "refresh_token": "<ya29.GlxtB_refresh_tokeN_gobbledegook>",
      "client_id": "<something.apps.googleusercontent.com>",
      "client_secret": "<some client secret string>"
  },
  "view_id": ...
}

If not provided and the tap runs without a --catalog also provided, use tap-google-analytics/defaults/default_report_definition.json as the default definition.

The reports.json file structure expected by the reports config key is really simple:

reports.json

[
  { "name" : "name of stream to be used",
    "dimensions" :
    [
      "Google Analytics Dimension",
      "Another Google Analytics Dimension",
      ... up to 7 dimensions per stream ...
    ],
    "metrics" :
    [
      "Google Analytics Metric",
      "Another Google Analytics Metric",
      ... up to 10 metrics per stream ...
    ]
  },
  {
  	... another stream definition ...
  },
  ... as many streams / reports as the user wants ...
]

For example, if you want to extract user stats per day in a users_per_day stream and session stats per day and country in a sessions_per_country_day stream:

[
  { "name" : "users_per_day",
    "dimensions" :
    [
      "ga:date"
    ],
    "metrics" :
    [
      "ga:users",
      "ga:newUsers"
    ]
  },
  { "name" : "sessions_per_country_day",
    "dimensions" :
    [
      "ga:date",
      "ga:country"
    ],
    "metrics" :
    [
      "ga:sessions",
      "ga:sessionsPerUser",
      "ga:avgSessionDuration"
    ]
  }
]

You can check tap-google-analytics/defaults/default_report_definition.json for a more lengthy, detailed example.

Run

tap-google-analytics -c config.json

Implementation details

This tap makes some explicit decisions:

Tap shortcomings (contributions are more than welcome):