2 min read

Cumulated And Rolling Calculations

When metrics fluctuate a lot, or a single row should take the bigger picture into account, cumulated and rolling calculations can be beneficial to get a clear view. The absolute classics are the rolling aggregation and the year-to-date cumulation, but you might come up with many more exciting metrics! We are using the third part of the window specification syntax, the window frame clause, to make all of this possible.

Rolling Average

The rolling average takes a specified number or range of preceding and following values of the same column into account. For example, the average of the preceding 4, current, and following 4 rows is a typical rolling average.

AVG(visits) OVER (ORDER BY date ROWS BETWEEN 4 PRECEDING AND 4 FOLLOWING)

Besides the rolling window on ROWS, we can also use the RANGE keyword. The range requires a numeric expression and evaluates it against the column used for ordering. If we have a query in which we cannot predict the number and consistency of rows, the RANGE keyword gives us a much more stable result.

Let's take a hypothetical example and look at individual GA360 sessions with the average session duration two hours before and after. Every hour will have a different amount of sessions. With the ROWS window frame clause we can look at a fixed amount of sessions before and after. With RANGE we can look at all rows that have a value within our range.

WITH data AS (
SELECT
  visitStartTime,
  totals.timeOnSite
FROM
  `bigquery-public-data.google_analytics_sample.ga_sessions_2017*`
)

SELECT
  visitStartTime,
  AVG(timeOnSite) OVER
    (ORDER BY visitStartTime RANGE BETWEEN 10800 PRECEDING AND 10800 FOLLOWING) avg_row_visits
FROM
  data
ORDER BY
  visitStartTime

Cumulated Year-to-date Sum

The year-to-date sum cumulates all the values of a column up to the current row. It is typically used to keep an eye on the budget or show progress towards a goal value. Instead of defining an explicit boundary to the window clause, we keep the window between UNBOUNDED PRECEDING and CURRENT ROW.

WITH data AS (
SELECT
  date,
  COUNT(totals.visits) visits
FROM
  `bigquery-public-data.google_analytics_sample.ga_sessions_2017*`
GROUP BY
  1
)

SELECT
  date,
  visits,
  SUM(visits) OVER (ORDER BY date RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) ytd_sum
FROM
  data

Conclusion

Once again, window functions proof to be powerful SQL features here. The BigQuery docs have a whole article just on the syntax under Analytic Functions Concepts in Standard SQL. If you are using multiple window functions in the same query, take a look at the named windows to improve readability!