7 min read

Google Analytics Timings

Google Analytics' standard reporting features for custom timing or duration is minimal. In SQL on raw data, however, this is pretty easy and flexible. In this post, we take a look at how to work more flexibly with the session date, calculate exact timestamps of hits, and analyze the duration between specific hits in a session.

Session Dates

The date column in the GA360 export schema is of type string. To display it nicely, visualize it with a tool like DataStudio, or to use any of the SQL date functions on it, we have to parse it. The PARSE_DATE function takes a format string and a date string as arguments. If you have worked with Python date formatting in the past, the elements of the format string are very familiar to you. To parse a date string of format YYYYMMDD, we need a format string of %Y%m%d.

SELECT
  PARSE_DATE("%Y%m%d", date) native_date
FROM
  `bigquery-public-data.google_analytics_sample.ga_sessions_20170129`

One example of the handy functions you can use with native date columns is the EXTRACT function. It takes a native date field and returns an integer for the corresponding part.

WITH data AS (
  SELECT
    PARSE_DATE("%Y%m%d", date) native_date
  FROM
    `bigquery-public-data.google_analytics_sample.ga_sessions_20170129`
)
SELECT
  -- Returns values in the range [1,7] with Sunday as the first day of the week.
  EXTRACT(DAYOFWEEK FROM native_date) dayofweek,
  EXTRACT(DAY FROM native_date) day,
  EXTRACT(DAYOFYEAR FROM native_date) dayofyear,
  -- Returns the week number of the date in the range [0, 53]. Weeks begin with Sunday, and dates prior to the first Sunday of the year are in week 0.
  EXTRACT(WEEK FROM native_date) week,
  -- Returns the week number of the date in the range [0, 53]. Weeks begin on WEEKDAY. Dates prior to the first WEEKDAY of the year are in week 0. Valid values for WEEKDAY are SUNDAY, MONDAY, TUESDAY, WEDNESDAY, THURSDAY, FRIDAY, and SATURDAY.
  EXTRACT(WEEK(MONDAY) FROM native_date) week_monday,
  -- Returns the ISO 8601 week number of the date_expression. ISOWEEKs begin on Monday. Return values are in the range [1, 53]. The first ISOWEEK of each ISO year begins on the Monday before the first Thursday of the Gregorian calendar year.
  EXTRACT(ISOWEEK FROM native_date) isoweek,
  EXTRACT(MONTH FROM native_date) month,
  -- Returns values in the range [1,4].
  EXTRACT(QUARTER FROM native_date) quarter,
  EXTRACT(YEAR FROM native_date) year,
  -- Returns the ISO 8601 week-numbering year, which is the Gregorian calendar year containing the Thursday of the week to which date_expression belongs.
  EXTRACT(ISOYEAR FROM native_date) isoyear
FROM
  data

The docs list all the other date functions: Date Functions in Standard SQL

If you need information out of date that you can't find in the feature set of standard SQL, take a look at the utility table bigquery-public-data:utility_eu.date_greg. Sometimes it can be faster to just join this table on instead of calculating fields like the first day of year/month/quarter/half.

Exact Hit Timestamps

Every hit in the BigQuery export schema for Google Analytics has a time field. This time counts the milliseconds since the beginning of the session. The visitStartTime field is the seconds since January 1st, 1970, often called a UNIX timestamp. To get the exact hit time as a native TIMESTAMP field, we bring the visitStartTime to millisecond level, sum it with the hit time, and use the TIMESTAMP_MILLIS function for conversion:

SELECT
  TIMESTAMP_MILLIS( (visitStartTime * 1000) + hits.time) native_hit_timestamp
FROM
  `bigquery-public-data.google_analytics_sample.ga_sessions_20170101`,
  UNNEST(hits) hits

One example of the handy functions you can use with native timestamp columns is the extract function. It takes a native timestamp field and returns an integer for the corresponding part. In this example, we are also specifying a timezone. The timezone is optional and defaults to UTC if you do not provide it. Timezones can either be the hour and minute difference, like described in the docs for Timezone definitions, or they can also be names from the list of tz database time zones.

WITH data AS (
  SELECT
    TIMESTAMP_MILLIS( (visitStartTime * 1000) + hits.time) native_hit_timestamp
  FROM
    `bigquery-public-data.google_analytics_sample.ga_sessions_20170101`,
    UNNEST(hits) hits
)
SELECT
  EXTRACT(MICROSECOND FROM native_hit_timestamp AT TIME ZONE "Europe/Berlin") microsecond,
  EXTRACT(MILLISECOND FROM native_hit_timestamp AT TIME ZONE "Europe/Berlin") millisecond,
  EXTRACT(SECOND FROM native_hit_timestamp AT TIME ZONE "Europe/Berlin") second,
  EXTRACT(MINUTE FROM native_hit_timestamp AT TIME ZONE "Europe/Berlin") minute,
  EXTRACT(HOUR FROM native_hit_timestamp AT TIME ZONE "Europe/Berlin") hour,
  EXTRACT(DAYOFWEEK FROM native_hit_timestamp AT TIME ZONE "Europe/Berlin") dayofweek,
  EXTRACT(DAY FROM native_hit_timestamp AT TIME ZONE "Europe/Berlin") day,
  EXTRACT(DAYOFYEAR FROM native_hit_timestamp AT TIME ZONE "Europe/Berlin") dayofyear,
  -- Returns the week number of the date in the range [0, 53]. Weeks begin with Sunday, and dates prior to the first Sunday of the year are in week 0.
  EXTRACT(WEEK FROM native_hit_timestamp AT TIME ZONE "Europe/Berlin") week,
  -- Returns the week number of timestamp_expression in the range [0, 53]. Weeks begin on WEEKDAY. datetimes prior to the first WEEKDAY of the year are in week 0. Valid values for WEEKDAY are SUNDAY, MONDAY, TUESDAY, WEDNESDAY, THURSDAY, FRIDAY, and SATURDAY.
  EXTRACT(WEEK(MONDAY) FROM native_hit_timestamp AT TIME ZONE "Europe/Berlin") week_monday,
  -- Returns the ISO 8601 week number of the datetime_expression. ISOWEEKs begin on Monday. Return values are in the range [1, 53]. The first ISOWEEK of each ISO year begins on the Monday before the first Thursday of the Gregorian calendar year.
  EXTRACT(ISOWEEK FROM native_hit_timestamp AT TIME ZONE "Europe/Berlin") isoweek,
  EXTRACT(MONTH FROM native_hit_timestamp AT TIME ZONE "Europe/Berlin") month,
  EXTRACT(QUARTER FROM native_hit_timestamp AT TIME ZONE "Europe/Berlin") quarter,
  EXTRACT(YEAR FROM native_hit_timestamp AT TIME ZONE "Europe/Berlin") year,
  -- Returns the ISO 8601 week-numbering year, which is the Gregorian calendar year containing the Thursday of the week to which date_expression belongs.
  EXTRACT(ISOYEAR FROM native_hit_timestamp AT TIME ZONE "Europe/Berlin") isoyear,
  EXTRACT(DATE FROM native_hit_timestamp AT TIME ZONE "Europe/Berlin") date,
  EXTRACT(DATETIME FROM native_hit_timestamp AT TIME ZONE "Europe/Berlin") datetime,
  EXTRACT(TIME FROM native_hit_timestamp AT TIME ZONE "Europe/Berlin") time
FROM
  data

The docs list all the other timestamp functions: Timestamp Functions in Standard SQL

In-Session Durations

To calculate the duration between individual hits, all we have to do is find the right hits and pull the hits.time field into session-level. Traditionally, we might have used a JOIN, but with BigQuery, a simple inline query is sufficient and much more performant. The output is just a float or integer field. Standard SQL does not have a native data type for durations.

The following example query looks a little complicated at first but is very straight forward on the second view. The query has two levels: a temporary view and the main query. What we are querying for are durations on session-level. That's why the view is not unnesting globally but only in the inline queries. An added benefit is a much-reduced output size of the temporary view.

The two inline queries take the hits field from the outer query and unnest it locally. With a simple filter to an event action and an ecommerce action type, we drill down to the hits we want. By ordering ascending on the hit number and then limiting to one output row, we only get the very first hit in the chronology of the hit that passes the WHERE condition. From this first hit, we are grabbing the hits.time field. The output of the temporary view is two integer fields of milliseconds for every session. Both fields can easily be NULL if there was no such hit in the session.

In the main query, we subtract both millisecond fields and add some averages. By using aggregation functions on all columns, we are reducing the output size to one row.

WITH data AS (
  SELECT
    (SELECT hits.time FROM UNNEST(hits) hits WHERE hits.eventInfo.eventAction = "Quickview Click" ORDER BY hits.hitNumber LIMIT 1) first_quick_view_ms,
    (SELECT hits.time FROM UNNEST(hits) hits WHERE hits.eCommerceAction.action_type = "6" ORDER BY hits.hitNumber LIMIT 1) first_transaction_ms
  FROM
    `bigquery-public-data.google_analytics_sample.ga_sessions_2017*`
)
SELECT
  AVG(first_quick_view_ms) avg_first_quick_view_ms,
  AVG(first_transaction_ms) avg_first_transaction_ms,
  AVG(first_transaction_ms - first_quick_view_ms) avg_quick_view_transaction_ms,
  COUNT(*) counts
FROM
  data

However, durations are still in milliseconds, and the AVG function has returned a float with many decimals. Such a level of detail is most-likely not necessary. To turn the columns into more readable seconds without decimals, we can divide by a thousand and cast to an integer by wrapping the AVG functions.

...
CAST(AVG(first_transaction_ms) / 1000 AS INT64) avg_first_transaction_sec,
...