SQL Tips and Tricks: Using LISTAGG for session data

By Charles Zhu - July 7, 2020

Here’s a problem every analyst faces: you track the activity on your website in terms of unique sessions, but your business cares about users. For example, your marketing stakeholder wants to know if those multi-channel ad dollars across many user sessions are resulting in conversions or not.

How can you transform the massive amount of sessions data you collect to the user-by-user data that your stakeholders care about? Introducing LISTAGG (or LIST_AGG in Redshift SQL), the dark horse contender for SQL aggregation function of the year.

Gone are the “first_click_utm” and “last_click_utm” fields of the past. In modern analytics, the practice is not to throw data away. With Sisu, you can comprehensively scan through every UTM code, every session action, and every key event to see how they impacted your sales, conversion, or unique business KPI.

Bonus points: it’s a lot less SQL. Let’s take a look at why.

The traditional approach using last-touch attribution

As we can see in our Swim-Equipment-as-a-Subscription table below, you’ve got rich, beautiful UTM source data for each session in terms of what brought that user in. However, each user only converts once. Your marketing stakeholder may be asking, “Which ads perform the best?” You both know it’s a complex question since users engage several times before buying, and no single ad may lead to a sale.

Even though the users engaged multiple times, you’ve got to come up with an answer, so you take a last-touch attribution approach and use only the UTM codes of a user’s most recent session:

The inner query selects for each user’s latest session, and the outer query gets the entire record, leading to the following table:

All that time and effort collecting past session data – poof, gone, just like that! It appears that the Facebook flipper sale has the same count of conversions as the pool filters.

An alternative approach with LISTAGG in SQL

Instead of losing all that valuable data, or worse  – having your marketing buddy make poorly informed decisions with only a subset of your data, let’s try a different approach.

The LISTAGG query in SQL enables you to aggregate and order without losing data. Where this really gets exciting is when you combine this new aggregated text field with Sisu’s new keyword transform capabilities. By building these long strings with LISTAGG, you get to keep and analyze all past sessions too in one fell swoop. Here’s an example using LIST_AGG in Redshift SQL: 

Sample LIST or LIST_AGG query in RedshiftThe ‘#’ concatenates all utm_campaigns into a string delimited by the ‘#’ character, and the handy ORDER BY enables you to order the string from most recent session to historical session.

If you don’t need to ORDER BY anything in particular, you can just use the CONCATENATE function instead. Either way, we get the following table:

Keep the sessions! Combining LISTAGG and Sisu for enhanced analysis

Using the new table you created with LISTAGG, in Sisu you can select the factor in the dropdown list, click the “…” to open the modal. To quickly search for it, enter  ‘#’ into the single-character delimiter box.

Next, you can enter 1 as the minimum keyword length to qualify as a keyword to be analyzed. Keyword length is just the minimum number of characters in between each ‘#’ in order for it to be picked up by Sisu for analysis.

Sisu pro tip
: Our new keyword transform feature also allows you to analyze comment fields or user reviews by using the empty space ‘ ‘ as the delimiter. In this case, the min keyword length should be set to be about 4 (to screen out very short common words like ‘the’ or ‘and’).

Voila! You now get facts like the one below, which showcases how Google-pool-filters performed with respect to conversions, as well as total landing page hits, across time.

Critically, you can see how a UTM code leads to conversions further down the line, even if that particular UTM code did not lead to any kind of conversion for its particular session.

You can even include the last_touch_utm_campaign code in the column list to help control for ordering. For example, if it’s primarily the last_touch_utm_campaign code that drives conversion, the last_touch_campaign code will show. But if it’s primarily the presence of a prior touch UTM code that leads to conversion, the latter will show.

Let’s Recap

Modern data analytics with Sisu enables you to use all your granular data, not just the first and/or most recent timestamp of engagement. Instead of just looking at specific points in time in a user journey, you can now look at holistic user engagement, conversion, and retention across multiple sessions.

Critical to unlocking this capability is using concatenating aggregate functions like LISTAGG or CONCATENATE in SQL. Whether its Google-pool-filters or FB-flipper-sales, you can rest easy knowing that Sisu can surface even a ‘hidden’ prior touch that led to most of your conversions. If you’d like to see these methods in action, join our next live product demo

Read more

Two new ways to answer why, faster: Text and segment analysis

Two new ways to answer why, faster: Text and segment analysis

Read more

Diagnosing Session-Level Data for a Streaming Service in Seconds

Learn how to go from the “what” to the “why” faster and more comprehensively. In this post, we’ll walk through how a data analyst at a large streaming services company can use Sisu to find the facts in session-level data.

Read more