5 min read

Custom Dimensions And Metrics

Having a strong concept for custom dimensions is what differentiates an okay Google Analytics setup from a good one. With GA360, we can specify up to 200 custom dimensions and 200 custom metrics. With all that space, we should generally overuse them instead of being too scarce with them. Especially when debugging tracking issues, it is convenient having a lot of metadata context and debugging information available.

Schema

In the Google Analytics export schema for BigQuery, a repeated RECORD field represents custom dimensions and metrics. The schema differs only slightly for dimensions and metrics in the type of the value.

{
    "name": "customDimensions",     // "customMetrics"
    "type": "RECORD",
    "mode": "REPEATED",
    "values": [
        {
            "name": "index",
            "type": "INTEGER",
            "mode": "NULLABLE"
        },
        {
            "name": "value",
            "type": "STRING",       // "INTEGER"
            "mode": "NULLABLE"
        }
    ]
}

In SQL the type for custom dimensions is defined as

-- customDimensions
ARRAY<STRUCT<index INT64, value STRING>>

--- customMetrics
ARRAY<STRUCT<index INT64, value INT64>>

Scope

As Google Analytics allows for custom dimensions and metrics in different scopes, the schema also has multiple places for this array of records. For dimensions, there are three places: globally as straightforward customDimensions for the user and session scope, within the hits field as hits.customDimensions for hit scope, and lastly on each product as hits.products.customDimensions. For custom metrics, there are only two places, because Google Analytics does not allow them on all scopes: Within the hits field as hits.customMetrics for hit scope and on each product as hits.products.customMetrics.

The scope is the context in which the value should be made available. While the only way to pass information to Google Analytics is via tracking hits, the scope setting makes it possible to make a value available beyond the context of the current hit. The Google Analytics processing handles this, and the result is made available in both the reporting engine and the BigQuery export.

You can sometimes notice that session- or user-scoped custom fields are also available on hit level hits.customDimensions. Those are the hits that sent the value initially, and it can be vastly useful when trying to debug from where a custom dimension value is coming.

Querying

To query individual custom fields on any scope, you have to get it out of its array structure. We take a look at three different approaches for doing that and discuss their pros and cons.

1. Unnesting and cross joining globally

Probably the first go-to when thinking about accessing custom fields for the first time. Within the main FROM statement, a simple UNNEST call on the repeated field and an implicit CROSS JOIN through the comma.

-- don't use this approach!
SELECT
  fullvisitorId,
  visitId,
  MAX(IF(customDimensions.index = 4, customDimensions.value, Null)) customDimension_4
FROM
  `bigquery-public-data.google_analytics_sample.ga_sessions_20170801`,
  UNNEST(customDimensions) customDimensions
GROUP BY
  1,
  2

In this example, we are using a GROUP BY and the MAX/IF combination to get to the custom dimension index we want. Here is another option with a WHERE filter:

-- don't use this approach!
SELECT
  fullvisitorId,
  visitId,
  customDimensions.value customDimension_4
FROM
  `bigquery-public-data.google_analytics_sample.ga_sessions_20170801`,
  UNNEST(customDimensions) customDimensions
WHERE
  customDimensions.index = 4

2. Inline query

Much nicer already and probably for most the default, is the inline query. Instead of unnesting globally and cross joining, a tidy little subquery takes care of pulling the data out of the repeated field.

SELECT
  fullvisitorId,
  visitId,
  (SELECT
    value
   FROM
     UNNEST(customDimensions) customDimensions
   WHERE
     index = 4) customDimension_4
FROM
  `bigquery-public-data.google_analytics_sample.ga_sessions_20170801`

3. User Defined Function

BigQuery standard SQL allows us to define custom functions! With a few extra lines of code, we can abstract away our length inline query and make it super reusable at the same time. Especially when querying multiple custom dimensions at the same time, this is super convenient. These user-defined functions (UDFs) should typically go at the very top of your query. I always keep the functions for custom dimensions and custom metrics in close reach and copy them into my queries as I use them a lot!

Custom Dimension Function

CREATE TEMP FUNCTION
  customDimensionByIndex(indx INT64,
    arr ARRAY<STRUCT<index INT64,
    value STRING>>) AS ( (
    SELECT
      x.value
    FROM
      UNNEST(arr) x
    WHERE
      indx=x.index) );

Custom Metric Function

CREATE TEMP FUNCTION
  customMetricByIndex(indx INT64,
    arr ARRAY<STRUCT<index INT64,
    value INT64>>) AS ( (
    SELECT
      x.value
    FROM
      UNNEST(arr) x
    WHERE
      indx=x.index) );

The syntax for defining UDFs looks complicated but is pretty simple. It starts with the keywords CREATE TEMP FUNCTION, followed by the name of the function customDimensionByIndex, the arguments the functions receives, the keywords AS and lastly SQL for the function body. In this custom dimension example, we are using a slightly modified version of the inline query from earlier as the function body. The only thing that is a little special here is the syntax for specifying the arguments with a combination of the argument name and type. For complex types like arrays and structs, SQL uses the < and > characters.

Once defined, you can use this function like any other SQL function in the same query string.

CREATE TEMP FUNCTION
  customDimensionByIndex(indx INT64,
    arr ARRAY<STRUCT<index INT64,
    value STRING>>) AS ( (
    SELECT
      x.value
    FROM
      UNNEST(arr) x
    WHERE
      indx=x.index) );

SELECT
  fullvisitorId,
  visitId,
  customDimensionByIndex(4, session.customDimensions) customDimension_4,
  customDimensionByIndex(4, hits.customDimensions) hits_customDimension_4
FROM
  `bigquery-public-data.google_analytics_sample.ga_sessions_20170801` session,
  UNNEST(hits) hits

UDFs also support JavaScript. This can be very handy, for example, when you need more flexibility working with Strings or complex conditional logic with variable output sizes.

Unfortunately, UDFs are not compatible with all use cases. If you rely on third-party tools to run your queries, for example, Data Studio, it often is the case that these tools wrap your query in additional SQL. UDFs do not allow that, and the query engine throws an error.

Conclusion

Whenever touching customDimensions, make sure to unnest carefully. A global UNNEST call messes up your data and is a pain to get right. My go-to approach for working with custom dimensions/metrics are UDFs. They add a little bit of boilerplate code but make the actual SQL statement much more readable.