Designing Better Datasets for Diagnostic Analytics: A Sisu How-to Guide

Issue link:

Contents of this Issue


Page 0 of 2

Designing better datasets for diagnostic analytics A Sisu How-to Guide Sisu Guide Aggregated data blocks effective diagnosis While this aggregation makes it easier for data visualization and BI tools to query these datasets, it dramatically reduces the utility of the data for more critical diagnosis and root cause analysis workflows. The unfortunate tradeoff: Root cause analyses suffer For effective diagnosis and root cause analysis, the more features the better. When you're trying to understand which customer segments, feature engagements, acquisition channels, and sales activities influence customer retention, simply looking at customer size and tenure isn't going to be enough. Unfortunately, that's the downside of these aggregations. Manual feature selection introduces bias The other barrier to understanding that these transformations introduce is bias. When data engineering teams take these rich, raw datasets and condense them for BI purposes, they are making human decisions about which features matter and then codifying these selections into the logic. These choices may have been well-informed when the transformations were designed, but business moves faster than data management, and these biases are ingrained in the decision-making process long past their original utility. Moreover, as new features are added to the original transaction records, these rich signals are ignored or discarded by the incumbent data processing patterns. As a result, the value of the data collected is diminished even as more information is available. For years, BI tools and legacy analytics warehouses have trained data teams to aggregate, simplify, and streamline complex datasets into narrow, normalized schema that can work within their limited processing capabilities. However, the newest class of cloud-native data warehouses - Redshift, Snowflake, BigQuery, and Azure Synapse - have overcome many of the scale and speed challenges faced by their predecessors. Unfortunately, because most business intelligence (BI) and exploratory analytics tools are tied to these legacy limitations, they're unable to work with the high-dimensional customer, transaction, and session-level data piped into these modern warehouses. This guide will provide a detailed exploration of why effective diagnostic analytics processes can benefit from more granular data, as well as practical exercises for building more useful datasets. Media and Entertainment ■ To diagnose customer conversion rates and content engagement, look at user- and session-level data. Subscription and Retail ■ For performance metrics like cohort analysis, customer LTV, and average order value, use customer- and transaction-level data. Gaming ■ Gaming companies will want to look at user-level and session-level data to diagnose ARPU, retention, and LTV.

Articles in this issue

view archives of Guides - Designing Better Datasets for Diagnostic Analytics: A Sisu How-to Guide