Solving the Data Analyst’s Dilemma:
Fast answers to more questions in a sea of data

By Callum McCann - January 26, 2021

If you’ve worked with data at any point in the past ten years, you’re probably familiar with two key parts of the analyst life.

1. The Buzzword of ‘Big Data.’ Everyone has their own definition of what it is, but we all know what it means — every company has more data than ever before. So much so that this industrious author has started a movement to kill that term. If everyone has Big Data, then no one has Big Data, and we can all go back to just calling it data.

2. The Endless List of Questions
. Now that all data is Big Data, we’ve stumbled onto a problem that didn’t exist before. In the past, we were constrained by the number of questions that we could ask. Nowadays, we’re constrained by the number of questions that we can answer. And every answer spawns more questions in return. Business users, executives, customers — we’ve become a data-driven society, and people want answers now. What is causing my sales to go down week over week? What is driving my conversion up? Why are new users spending less time in our app? There aren’t enough analysts in the world to answer all the questions people are asking.

But knowing the why is a tricky question. Let’s say you’re an analyst at an e-commerce company, and your company rolls out a new “Best Sellers” section of the homepage. Traffic to the featured products seems to suggest it’s a huge hit — two million events a day. Way to go, team!

But that’s not the end of the story. Now your boss is messaging you over Slack (or Teams but probably not Hangouts) and wants to know if the Best Sellers are increasing conversions, which you both define as a successful sale. And to throw another wrench into that mix, he wants to know if the new section is increasing conversions across ALL products, not just those listed as Best Sellers. Welp – goodbye weekend, hello endless SQL queries.

And even after you’ve run hundreds of grouping queries and tracked down what you think the answer is, can you really be sure that you checked every possibility? Human beings are notoriously affected by our own biases, even subtle ones that we might not consider when looking at data, and they may lead us to incorrect conclusions.

Worry not, dear sleep-deprived analyst; there are solutions to this problem that don’t require burning the midnight oil. In fact, this is one of the reasons that we built Sisu. We let the machines do what they’re great at (computation, aggregation, statistics) and allow humans to focus on doing what we do best — interpreting the results, adding the business context, and building the story to communicate.

Now you can set Sisu up, make yourself a beautiful cup of coffee (or tea but hey, years of being an analyst probably built up that caffeine addiction), and come back to have all the results analyzed, prioritized, and ready for you to figure out what’s actually driving change.

To show you what I mean, let’s continue with the previous example. Imagine you’re looking at the following dataset (simplified to three rows). Following your boss’s request, you’re most likely going to spend hours going through the dataset and trying to determine what factors are driving conversions. Is it all mobile customers whose final event was Dino_Jump? Users from google who looked at Slip-N-Slides? And what if these two combinations have overlap? Which one of them is actually driving change?