Google Analytics Setup Guide

This guide provides step-by-step instructions for setting up the sample jobs located in the Google Analytics example folder.

Before continuing with this guide, ensure you have completed the following prerequisites:

Preparation

  1. Donwlaod or Clone the Quickstart Repository to your local machine.

  2. Navigate to the directory examples/google_analytics

  3. Open the setup.sh file in a text editor to update the following variables, or alternatively, set their values using environment variables:

    • CONN Specify the Snowsql connection for the Snowflake account, ensuring it points to database: SNOWEAVER and schema: LEARN_DEV. Verify that the role associated with this connection has the required privileges to execute the commands in setup.sql. Ideally, use the AccountAdmin role or an equivalent role in a sandbox environment.

    • USER The Snowflake user defined in the Snowsql connection.

    • GA_OAUTH_CLIENT_ID Your Google Analytics OAuth client ID.

    • GA_OAUTH_CLIENT_SECRET Your Google Analytics OAuth client secret.

    • GA_OAUTH_REFRESH_TOKEN Your Google Analytics OAuth refresh token.

    • GA_PROPERTY_ID Your Google Analytics property ID.

Set up the sample project

  1. On your client machine, open a command-line terminal (Linux, macOS) or Git Bash (Windows) and navigate to the same directory.

  2. Execute the following command to set up the example project:

    bash setup.sh
    

    Note

    If the LEARN_DEV project already exists, all its resources will be deleted, with the exception of the DATA stage and the RESULTS table.

    To retain your data, you can remove the -d flag from the ‘sw import all -c ${CONN} -d’’ command in the setup.sh script.

    Alternatively, use the Snoweaver CLI to create a backup of the project’s contents.

  3. After the execution completes, you can access the sample jobs in the LEARN_DEV project.

Sample jobs

ga_run_report

This job generates a report from Google Analytics using the Google Analytics Data API RunReport. You can use the following variables to control the report parameters:

  • start_date: The start date for the report (e.g., ‘2024-08-01’).

  • end_date: The end date for the report (e.g., ‘2024-08-30’).

  • dimensions: An array of dimensions to include in the report (e.g., [{ “name”: “country” }]).

  • metrics: An array of metrics to include in the report (e.g., [{“name”: “activeUsers”}, {“name”: “newUsers”}, {“name”: “totalRevenue”}]).

  • limit: The number of rows to return per API call (e.g., 10000).

  • offset: The number of rows to skip (e.g., 0).

  • get_all: A boolean indicating whether to retrieve all data (True) or just a single batch (False).

The job implements the following logic in the posthook:

  1. Error handling: If the API response status code is not in the range 200-299, an exception is raised with the error message from the response.

  2. Pagination: If get_all is True and there are more rows to fetch (i.e., offset + limit < total row count):

    • The offset is updated to skip the rows already retrieved.

    • The job calls itself recursively with the updated offset to fetch the next batch of data.

  3. If get_all is False, the job terminates after processing a single batch of data.

How to run the sample jobs

  1. Open a Snowflake worksheet and execute the following statements:

    USE ROLE LEARN_DEV_DEVELOPER;
    USE WAREHOUSE LEARN_SW_WH;
    USE SCHEMA SNOWEAVER.LEARN_DEV;
    
    -- Run the ga_run_report job
    -- The job is set with a limit of 5 rows per API call to demonstrate pagination . In a real-world scenario, you might want to use a larger limit for better performance.
    CALL GA_RUN_REPORT(
       '2024-08-01',  -- start_date
       '2024-08-30',  -- end_date
       [{ 'name': 'country' }],  -- dimensions
       [
          {'name': 'activeUsers'},
          {'name': 'newUsers'},
          {'name': 'totalRevenue'}
       ],  -- metrics
       5,  -- limit (number of rows per API call)
       0,  -- offset (starting point for data retrieval)
       True  -- get_all (retrieve all data)
    );
    
    -- View the results
    SELECT
      R.VALUE:dimensionValues[0].value::STRING AS country,
      R.VALUE:metricValues[0].value::NUMBER AS active_users,
      R.VALUE:metricValues[1].value::NUMBER AS new_users,
      R.VALUE:metricValues[2].value::NUMBER(38,2) AS total_revenue
    FROM
      RESULTS QR,
      LATERAL FLATTEN(input => QR.DATA:rows) R
    WHERE
      QR.SCOPE = 'ga_report: country | activeUsers, newUsers, totalRevenue'
      AND QR.JOB_VARIABLES:start_date = '2024-08-01'
      AND QR.JOB_VARIABLES:end_date = '2024-08-30'
    ORDER BY
      active_users DESC,
      country;