5 min read

Funnels and Flows

Let's take a look at two approaches for creating a funnel or flow analysis: The typical window navigation query and a simple subquery example. Both approaches only create funnels within one user session how to get around that we review in the end.

Window navigation

A straightforward strategy to analyze funnels in SQL is to use window functions and navigate through a set of user interactions. The LAG function can grab the preceding or nth preceding row from a defined window. A simple example could be the following query:

WITH t1 AS (
  SELECT
    LAG(hits.page.pagePath, 2, "ENTRANCE") OVER session_hits step_1_path,
    LAG(hits.page.pagePath, 1, "ENTRANCE") OVER session_hits step_2_path,
    hits.page.pagePath step_3_path
  FROM
    `project.dataset.ga_sessions_20190713`,
    UNNEST(hits) hits
  WINDOW
    session_hits AS (PARTITION BY fullvisitorId, visitId ORDER BY hits.hitNumber)
)
SELECT
  *,
  COUNT(*) counts
FROM
  t1
WHERE
  step_3_path LIKE "/shop/thank-you%"
GROUP BY
  1, 2, 3

We first create a temporary view (l. 8) to select the first and second preceding page path of every hit (l. 10-11) together with every hits own page path. In the main SELECT statement of the query, we then pull out only the rows that start on the conversion page, group them and add a count aggregation.

We call the LAG function over the named window session_hits. Naming the window removes a lot of replicated code and makes adjustments easier.

Instead of grabbing just the plain page path out of the window with the LAG function, we can also use SELECT AS STRUCT to grab multiple fields.

...
  LAG(
    (SELECT AS STRUCT hits.eventInfo.eventAction, hits.page.pagePath),
    1,
    "ENTRANCE"
  ) OVER session_hits step_2_path
...

To filter the kinds of hits you want to span your window across, use a WHERE statement on the same level as the window. Here an example to include only pageviews:

WITH t1 AS (
  SELECT
    LAG(hits.page.pagePath, 2, "ENTRANCE") OVER session_hits step_1_path,
    LAG(hits.page.pagePath, 1, "ENTRANCE") OVER session_hits step_2_path,
    hits.page.pagePath step_3_path
  FROM
    `project.dataset.ga_sessions_20190713`,
    UNNEST(hits) hits
  WHERE
    hits.type = "PAGE"
  WINDOW
    session_hits AS (PARTITION BY fullvisitorId, visitId ORDER BY hits.hitNumber)
)
...

This approach is excellent to replicate the Google Analytics behavior flow reports. However, it has the limitation that it is rigorous on the succession of hits. If the user reloads a step or takes a detour in a separate tab, the funnel breaks. Also, the entire funnel calculation relies on the last step's hit. Multiple hits on the last step count as multiple funnels. To avoid this, let's try a different approach.

Session-based funnels

In cases where the strict succession of hits is not as important, or we evaluate funnel completion/abandonment the session level, we should stay away from window navigation. Instead, we use a series of in-line queries and then compare hit numbers to check our funnel criteria.

WITH t1 AS (
  SELECT
    (SELECT hits.hitNumber FROM UNNEST(hits) hits WHERE hits.page.pagePath LIKE "/shop/basket" ORDER BY hits.hitNumber LIMIT 1) step_1_hitNumber,
    (SELECT hits.hitNumber FROM UNNEST(hits) hits WHERE hits.page.pagePath LIKE "/shop/payment" ORDER BY hits.hitNumber LIMIT 1) step_2_hitNumber,
    (SELECT hits.hitNumber FROM UNNEST(hits) hits WHERE hits.page.pagePath LIKE "/shop/thank-you" ORDER BY hits.hitNumber LIMIT 1) step_3_hitNumber,
    channelGrouping
  FROM
    `project.dataset.ga_sessions_20190713`
)
SELECT
  channelGrouping,
  COUNT(step_1_hitNumber) step_1_completions,
  COUNT(step_2_hitNumber) step_2_completions,
  COUNT(step_3_hitNumber) step_3_completions
FROM
  t1
GROUP BY 1

You can easily add more criteria in the main SELECT statement. For example, to make sure the funnel steps are in the correct order, we could do a COUNTIF like this:

...
SELECT
  channelGrouping,
  COUNT(step_1_hitNumber) step_1_completions,
  COUNTIF(step_2_hitNumber > step_1_hitNumber) step_2_completions,
  COUNTIF(step_2_hitNumber > step_1_hitNumber 
          AND step_3_hitNumber > step_2_hitNumber) step_3_completions
FROM
...

This approach is very well suited to replicate the Google Analytics funnel visualizations as they are also on the session-level. It also uses relatively basic SQL features making it super easy to read and follow along.

What this query cannot do, however, is building funnels across sessions. Let's take a look at our third and last approach.

Funnels across multiple sessions

Sales funnels of micro and macro conversions are often not meant to complete in one session. They instead span multiple sessions and larger timeframes. For this kind of analysis, we reuse the concepts one and two with small alterations.

The LAG functions over a window of interactions can be quickly reused to go across sessions by dropping the visitId from the window definition:

WITH t1 AS (
  SELECT
    LAG(hits.page.pagePath, 2, "ENTRANCE") OVER user_hits step_1_path,
    LAG(hits.page.pagePath, 1, "ENTRANCE") OVER user_hits step_2_path,
    hits.page.pagePath step_3_path
  FROM
    `project.dataset.ga_sessions_20190713`,
    UNNEST(hits) hits
  WINDOW
    user_hits AS (PARTITION BY fullvisitorId ORDER BY hits.hitNumber)
)
SELECT
  *,
  COUNT(*) counts
FROM
  t1
WHERE
  step_3_path LIKE "/shop/thank-you%"
GROUP BY
  1, 2, 3

The session-based funnels need a few more changes as we have to craft our own hits array. We also use a window function for this, but this time with the ARRAY_AGG function.

WITH t1 AS (
  SELECT
    channelGrouping,
    ARRAY_AGG(hits) OVER (PARTITION BY fullvisitorId ORDER BY visitNumber, hits.hitNumber) hits
  FROM
    `project.dataset.ga_sessions_20190713`,
    UNNEST(hits) hits
), t2 AS (
  SELECT
    (SELECT hits.hitNumber FROM UNNEST(hits) hits WHERE hits.page.pagePath LIKE "/shop/basket" ORDER BY hits.hitNumber LIMIT 1) step_1_hitNumber,
    (SELECT hits.hitNumber FROM UNNEST(hits) hits WHERE hits.page.pagePath LIKE "/shop/payment" ORDER BY hits.hitNumber LIMIT 1) step_2_hitNumber,
    (SELECT hits.hitNumber FROM UNNEST(hits) hits WHERE hits.page.pagePath LIKE "/shop/thank-you" ORDER BY hits.hitNumber LIMIT 1) step_3_hitNumber,
    channelGrouping
  FROM
    t1
)
SELECT
  channelGrouping,
  COUNT(step_1_hitNumber) step_1_completions,
  COUNT(step_2_hitNumber) step_2_completions,
  COUNT(step_3_hitNumber) step_3_completions
FROM
  t2
GROUP BY 1

Conclusion

And there we have it: Three different approaches to analyzing user flows or funnels. Whatever approach you pick, make sure it fits your use case and clearly define your funnel before you get going.