SQL Tips and Tricks: Using LISTAGG for session data

By Charles Zhu - July 7, 2020

Here’s a problem every analyst faces: you track the activity on your website in terms of unique sessions, but your business cares about users. For example, your marketing stakeholder wants to know if those multi-channel ad dollars across many user sessions are resulting in conversions or not.

How can you transform the massive amount of sessions data you collect to the user-by-user data that your stakeholders care about? Introducing LISTAGG (or LIST_AGG in Redshift SQL), the dark horse contender for SQL aggregation function of the year.

Gone are the “first_click_utm” and “last_click_utm” fields of the past. In modern analytics, the practice is not to throw data away. With Sisu, you can comprehensively scan through every UTM code, every session action, and every key event to see how they impacted your sales, conversion, or unique business KPI.

Bonus points: it’s a lot less SQL. Let’s take a look at why.

The traditional approach using last-touch attribution

As we can see in our Swim-Equipment-as-a-Subscription table below, you’ve got rich, beautiful UTM source data for each session in terms of what brought that user in. However, each user only converts once. Your marketing stakeholder may be asking, “Which ads perform the best?” You both know it’s a complex question since users engage several times before buying, and no single ad may lead to a sale.

Even though the users engaged multiple times, you’ve got to come up with an answer, so you take a last-touch attribution approach and use only the UTM codes of a user’s most recent session:

The inner query selects for each user’s latest session, and the outer query gets the entire record, leading to the following table: