Google Analytics Setup Guide
This guide provides step-by-step instructions for setting up the sample jobs located in the Google Analytics example folder.
Before continuing with this guide, ensure you have completed the following prerequisites:
Obtain OAuth credentials and a refresh token by following this guide Enabling Google Analytics API and Obtaining OAuth Credentials.
Preparation
Donwlaod or Clone the Quickstart Repository to your local machine.
Navigate to the directory examples/google_analytics
Open the setup.sh file in a text editor to update the following variables, or alternatively, set their values using environment variables:
CONN Specify the Snowsql connection for the Snowflake account, ensuring it points to database: SNOWEAVER and schema: LEARN_DEV. Verify that the role associated with this connection has the required privileges to execute the commands in setup.sql. Ideally, use the AccountAdmin role or an equivalent role in a sandbox environment.
USER The Snowflake user defined in the Snowsql connection.
GA_OAUTH_CLIENT_ID Your Google Analytics OAuth client ID.
GA_OAUTH_CLIENT_SECRET Your Google Analytics OAuth client secret.
GA_OAUTH_REFRESH_TOKEN Your Google Analytics OAuth refresh token.
GA_PROPERTY_ID Your Google Analytics property ID.
Set up the sample project
On your client machine, open a command-line terminal (Linux, macOS) or Git Bash (Windows) and navigate to the same directory.
Execute the following command to set up the example project:
bash setup.sh
Note
If the LEARN_DEV project already exists, all its resources will be deleted, with the exception of the DATA stage and the RESULTS table.
To retain your data, you can remove the -d flag from the ‘sw import all -c ${CONN} -d’’ command in the setup.sh script.
Alternatively, use the Snoweaver CLI to create a backup of the project’s contents.
After the execution completes, you can access the sample jobs in the LEARN_DEV project.
Sample jobs
ga_run_report
This job generates a report from Google Analytics using the Google Analytics Data API RunReport. You can use the following variables to control the report parameters:
start_date: The start date for the report (e.g., ‘2024-08-01’).
end_date: The end date for the report (e.g., ‘2024-08-30’).
dimensions: An array of dimensions to include in the report (e.g., [{ “name”: “country” }]).
metrics: An array of metrics to include in the report (e.g., [{“name”: “activeUsers”}, {“name”: “newUsers”}, {“name”: “totalRevenue”}]).
limit: The number of rows to return per API call (e.g., 10000).
offset: The number of rows to skip (e.g., 0).
get_all: A boolean indicating whether to retrieve all data (True) or just a single batch (False).
The job implements the following logic in the posthook:
Error handling: If the API response status code is not in the range 200-299, an exception is raised with the error message from the response.
Pagination: If get_all is True and there are more rows to fetch (i.e., offset + limit < total row count):
The offset is updated to skip the rows already retrieved.
The job calls itself recursively with the updated offset to fetch the next batch of data.
If get_all is False, the job terminates after processing a single batch of data.
How to run the sample jobs
Open a Snowflake worksheet and execute the following statements:
USE ROLE LEARN_DEV_DEVELOPER; USE WAREHOUSE LEARN_SW_WH; USE SCHEMA SNOWEAVER.LEARN_DEV; -- Run the ga_run_report job -- The job is set with a limit of 5 rows per API call to demonstrate pagination . In a real-world scenario, you might want to use a larger limit for better performance. CALL GA_RUN_REPORT( '2024-08-01', -- start_date '2024-08-30', -- end_date [{ 'name': 'country' }], -- dimensions [ {'name': 'activeUsers'}, {'name': 'newUsers'}, {'name': 'totalRevenue'} ], -- metrics 5, -- limit (number of rows per API call) 0, -- offset (starting point for data retrieval) True -- get_all (retrieve all data) ); -- View the results SELECT R.VALUE:dimensionValues[0].value::STRING AS country, R.VALUE:metricValues[0].value::NUMBER AS active_users, R.VALUE:metricValues[1].value::NUMBER AS new_users, R.VALUE:metricValues[2].value::NUMBER(38,2) AS total_revenue FROM RESULTS QR, LATERAL FLATTEN(input => QR.DATA:rows) R WHERE QR.SCOPE = 'ga_report: country | activeUsers, newUsers, totalRevenue' AND QR.JOB_VARIABLES:start_date = '2024-08-01' AND QR.JOB_VARIABLES:end_date = '2024-08-30' ORDER BY active_users DESC, country;