Data Analyst 3.0: The next evolution of data workflows

By Sid Sharma - July 16, 2020

Like any good story arc, we’ve come a long way since the origins of data analytics. The first phase of BI started with rigid, IT-owned systems. The second phase followed with a wave of more flexible, business-oriented tools that enable a more business-facing Data Analyst mindset — and a tsunami of pretty, easy to filter, but often static dashboards.

Today — with the rise of cloud-native data warehouses and advancements in scalable inference methods — we’re at the cusp of a third phase that not only affords better, faster processing of data, but also lets operational data analysts impact business decisions like never before. I call this phase Data Analyst 3.0.

A brief history of Data Analytics

Before we look at the factors bringing in Data Analyst 3.0, let’s take a look at how far we’ve come. It used to be that a single person within the IT team could gain all the relevant domain and technology skills necessary to become a “data expert.” Data wasn’t big or wide, which meant that people could obtain new data skills (Excel, lightweight SQL, SAS, etc.) as problems arose, and the process of sending over a CSV to answer questions worked just fine.

But, from the organization’s perspective, most data requests failed in the handoff between IT and the business because technologists didn’t know how to make their data infrastructure consumable to an everyday Excel user. The queries that IT teams could deliver only answered a single question about a specific KPI. This had two major issues:

  • One question, one answer. The single-answer nature of these inquiries prevented the kind of iterative questioning most business users depended on before taking action. In the 1.0 IT model, this meant every new question required getting back in line and waiting, to the point where people stopped asking questions altogether.
  • A chasm between data and decision-making. The people with business knowledge could not get involved in the data exploration process, which is where all of the discoveries happen. So, all you got were rolled-up KPIs but no “ah-ha” moments.

Fortunately, this system has largely disappeared over the last ten years alongside the rise of more business-centric data modeling, BI, and visualization tools. These modern tools define the second wave of BI and help make Data Analyst 2.0 a more agile member of the team.

Beneath these end-user tools, this second wave is supported by several platforms that make it easier to derive value from the vast amounts of data we’re storing. Collectively, these tools make up a modern analytics stack.

Typical analytics stack of BI’s second wave

The exact evolution of this analytics stack is a fascinating topic, but I’ll save it for another post.

To navigate and maintain this stack efficiently, businesses needed more than just the IT team, so a few common roles emerged:

  • Data Engineers who are responsible for preparing data. This means loading data from disparate sources into a data warehouse and then transforming that raw data into transformed tables that are useful for analysis by Analysts and Data Scientists.
  • Data Analysts who are responsible for answering expected (reporting) and unexpected (diagnostic) business questions.
  • Data Scientists who use statistical algorithms and ML techniques to solve focused business problems (“what if”).

One way that you can think about the distinction in these roles is whether they act before or after the data is collected. Data Engineers are responsible for operations before the data is collected (and transformed), while Analysts and Data Scientists are responsible for operations after the data is collected.

How did the Data Analyst become a second-class citizen?

Like Google’s Cassie Korzykov mentions in one of her insightful posts, if your primary skill falls closest to that of a Data Analyst, chances are you feel left behind in your “technical” expertise by your Data Science counterparts. Even the job market views the data scientist role as a level up from you. Only a few people realize that these two roles are entirely different from one another.

Data Scientists provide high-effort solutions to specific problems. If the issues they tackle aren’t worth solving, businesses end up wasting their time. They are narrow-and-deep workers, so it’s imperative to point them at problems that deserve the effort. To ensure you make good use of their time, you need to be sure you already have the right problem or need a wide-and-shallow approach to finding one

This is where a Data Analyst can help the business. A Data Analyst’s primary goal is to surf vast datasets quickly, liaise with the business stakeholders, and surface potential insights. Speed is their highest virtue. The result: the company gets a finger on its pulse and eyes on previously-unknown unknowns. This generates the inspiration for decision-makers to select the most valuable quests for Data Scientists.

Today’s Data Analyst Toolkit: Digging the Grand Canyon with a spoon

Unfortunately, many Data Analysts today are stuck in a quandary. They’re sitting on a treasure trove of rich, wide data, but they’re often torn between the dual roles of summarizing data to report key metrics versus the deep, comprehensive tasks of metric diagnosis.

These second-wave BI tools are well-equipped to create rich, rolled-up dashboards to answer ‘what has happened.’ However, in our experience, these dashboards often fall short in the precise moments when businesses need Analysts to add the most value – when something goes wrong (like those Monday morning meetings when the VP of Sales asks, “Why did sales drop 50% in EMEA last month?”).

Frequently, this is because these views are built on simplified, aggregated views of data that put handcuffs on data exploration and diagnosis. If you only operate on aggregates, you can’t explore in detail. Part of creating aggregates is presupposing what questions people are going to ask—as if they are cast in stone. So, a Data Analyst commits up front what they want to show in a finite real estate of a dashboard.

Sure, Analysts can add ‘filters’ or enable ‘drill-downs’ on dashboards, but as the columns and unique values within each column grow, the number of ways in which they could slice the data explodes (50 stores * 100 SKUs * 5 coupon codes * 20 cities … you get the point). This is why the dream of self-serve analytics morphed into death by a thousand filters.

When an ad-hoc question comes in, a Data Analyst often starts the diagnosis from scratch. It’s a manual process involving SQL to fetch the granular data, adding relevant dimensions, and finally using Python/R to dig up the insight. The process is reactive, needs to start from scratch every time, and hampers decision-making velocity. Result: Businesses end up in a similar situation as the first BI wave – business stakeholders queuing in their tickets, this time to get an answer to their “why” questions.

Data Analyst 3.0 – Automating the path to ‘why’

What Data Analysts need are faster, easier, and more comprehensive ways to build, monitor, and diagnose granular, high-dimensional datasets — a new paradigm that can quickly answer, “Why does the data look the way it does?” and “What changed from last week?” This new paradigm could piggyback on two recent technological advancements:

1. Cloud-native storage and compute: Today, not only is it cheaper than ever to store data in cloud warehouses, but it is also 25%-50% faster* to query data from big flat denormalized tables than star schemas – thanks to advancements in massively parallel processing. Based on this new reality, Analysts need to rethink how they model their data for data diagnosis vs. data reporting.

For quick data diagnosis, each modeled table should tie to a unit of value. As an example, if the business cares about “Average Transaction Value” and “Subscription Conversions,” you’d want to create modeled tables at the transaction and user grain, respectively. Another thing to keep in mind is the diversity of data. Instead of cherry-picking dimensions, which leads to human bias and removes possible rich signals in the data, you should include every possible dimension to these granular tables. As a Data Analyst, you don’t know all the follow-up questions you’ll be asked to investigate, and you want to be able to follow the trail wherever it leads. Even if an aggregated export could answer your top-level questions of today, it’s unlikely that the same export can answer your follow-up questions.

2. Advancements in proactive analytics: Rather than manually using dashboard filters and drill-downs to mine for insights in big tables, Analysts can now rely on recent algorithmic developments in scalable inference methods in tools like Sisu to explore billions of combinations and identify the most impactful facts. It is, then, up to the Data Analyst to pick the best answer based on the additional information they have access to (like company strategy, business context, and market conditions).
the workflow of Data Analyst 3.0
Adding this proactive, automated inference on top of big, feature-rich tables discussed above allows a Data Analyst to:

  • Quickly answer ad-hoc questions that arise out of your aggregated BI numbers and answer 10x more questions with the same effort.
  • Get machine-monitored smart alerts on autopilot – giving them the peace of mind that they have their finger on the pulse of the business, never snoozing past potentially useful gems.

It’s important to keep in mind that the goal of this diagnostic data layer is not to replace the BI tool but to augment it. Instead, the diagnostic data layer is the smart “check engine” light to complement your “speedometer and gas gauge” BI dashboard.

A peek into the future of a Data Analyst

These advancements are fundamentally changing how a Data Analyst functions within the company. They attend weekly business meetings to present what happened (‘business as usual’) and to share proactive solutions for the business. And at the center of this newfound authority is an AI-powered workflow.

We, at Sisu, are leveraging these advancements in data warehousing, data modeling, and AI, and supercharging the Analyst workflow at companies like Samsung, Upwork, and Housecall Pro. The transition can be a little scary at first, but ask the companies that are winning how it’s working out for them. They’ll tell you they feel fine.

If you’d like to share your thoughts on this piece or would like to learn more, write to me directly at [email protected]

*Fivetran, 2020. Retrieved, June 30, 2020

Read more

Fast Data Demands Faster Analytics: Three Tips for Accelerating Analytics

During our recent webinar, "The Future of Analytics: Faster data demands faster analytics," we discussed three tips for accelerating analytics by automating fast data."

Read more

Designing Datasets: Four Principles to Advance Data Diagnosis

With more transactional data in cloud-native warehouses than ever before, analysts should stop aggregating their data for business intelligence tools. To help, here are four principles on designing datasets for cloud-native diagnosis.

Read more