When metrics fluctuate a lot, or a single row should take the bigger picture into account, cumulated and rolling calculations can be beneficial to get a clear view. The absolute classics are the rolling aggregation and the year-to-date cumulation, but you might come up with many more exciting metrics! We are using the third part of the window specification syntax, the window frame clause, to make all of this possible.
The rolling average takes a specified number or range of preceding and following values of the same column into account. For example, the average of the preceding 4, current, and following 4 rows is a typical rolling average.
AVG(visits) OVER (ORDER BY date ROWS BETWEEN 4 PRECEDING AND 4 FOLLOWING)
Besides the rolling window on
ROWS, we can also use the
RANGE keyword. The range requires a numeric expression and evaluates it against the column used for ordering. If we have a query in which we cannot predict the number and consistency of rows, the
RANGE keyword gives us a much more stable result.
Let's take a hypothetical example and look at individual GA360 sessions with the average session duration two hours before and after. Every hour will have a different amount of sessions. With the
ROWS window frame clause we can look at a fixed amount of sessions before and after. With
RANGE we can look at all rows that have a value within our range.
WITH data AS ( SELECT visitStartTime, totals.timeOnSite FROM `bigquery-public-data.google_analytics_sample.ga_sessions_2017*` ) SELECT visitStartTime, AVG(timeOnSite) OVER (ORDER BY visitStartTime RANGE BETWEEN 10800 PRECEDING AND 10800 FOLLOWING) avg_row_visits FROM data ORDER BY visitStartTime
Cumulated Year-to-date Sum
The year-to-date sum cumulates all the values of a column up to the current row. It is typically used to keep an eye on the budget or show progress towards a goal value. Instead of defining an explicit boundary to the window clause, we keep the window between
UNBOUNDED PRECEDING and
WITH data AS ( SELECT date, COUNT(totals.visits) visits FROM `bigquery-public-data.google_analytics_sample.ga_sessions_2017*` GROUP BY 1 ) SELECT date, visits, SUM(visits) OVER (ORDER BY date RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) ytd_sum FROM data
Once again, window functions proof to be powerful SQL features here. The BigQuery docs have a whole article just on the syntax under Analytic Functions Concepts in Standard SQL. If you are using multiple window functions in the same query, take a look at the named windows to improve readability!