Analytics
By Charles Zhu - August 4, 2020
As the COVID-19 pandemic redefines day-to-day life for many, it’s also casting a spotlight on the inequities and flaws in U.S. systems — especially along racial lines. In fact, a recent New York Times report analyzed CDC data and found that Black and Latino sub-populations are disproportionately affected by the virus.
With COVID-19 cases rising across the U.S. as states reopen, this report had our team wondering what other trends we could observe between the first spike in cases in March and the current spike through July.
Using readily accessible data up to July 15th (courtesy of the Snowflake share of Demyst and Starschema data), we found critical differences in just a few clicks:
Important note: While experts in public health have reviewed these findings, they do not constitute policy recommendations in any way.
Before we could dive into the data to see trends, we had to make sure we had a full view of the factors contributing to this surge.
Using Snowflake’s ability to do cross-database joins and a simple SQL query, we quickly joined three datasets. The first dataset from Demyst is a current list of nationwide COVID-19 cases from the New York Times, refreshed every 24 hours. We joined this with Starschema’s dataset, which tracks U.S. policy actions by state, giving us context on what measure and restrictions were lifted. Finally, we joined all of this with a census dataset from American Community Survey to understand each county’s demographic breakdown. A screenshot of these DBs in action is below.
With these datasets joined in our Snowflake database, we pointed Sisu at the table and set up a time-based objective comparing New_Cases in the four week period beginning March 17th and ending April 13th, to New_Cases in a four week period starting June 16th and ending July 13th.
We see that compared to the first four-week surge in March, when much of the country began adopting lock-down and social distancing measures, the number of new cases doubled (2x) from 593,000 to 1.2 million.
But this aggregate 2x change obscures the myriad of subpopulations that reported far fewer new cases or far more.
When we look at the subpopulations in Sisu, we immediately see that certain populations bore the brunt of this second surge in cases.
For example, counties with a higher Black population saw a slower growth in cases when compared to the national average, while predominantly Latino counties saw a greater rise in new cases when compared to the national average.
Counties with a greater proportion of Latinos than the average county saw a 6.5x increase in total new cases in June, compared to the first surge in March. In other words, while the Latino subpopulation makes up roughly 27% of all counties in the United States, predominately-Latino communities accounted for 57% of the 2x increase in coronavirus cases between 3/17-4/13 and 6/16-7/13.
From 3/17-4/13, the total number of new cases in more Latino counties was 65K. From 6/16-7/13, the total number of new cases in more Latino counties was 421K, a 6.5x increase.
But excluding the predominately-Latino counties already discussed, the data shows that counties with 1) more than 50% minority population and 2) more below the poverty line than average actually saw a smaller increase in new cases when comparing 6/16-7/13 relative to 3/17-4/13.
Specifically, while nationally there was a 100% (2x) increase in new cases in the June/July surge over the March/April surge:
While we acknowledge that testing proportions may differ among different counties, it appears that in this second surge of cases is afflicting more Latino and White populations than in the first surge of cases in March.
While the first wave of cases in March was primarily in coastal states and metropolitan areas, in the most recent spike rural areas and sparsely populated states are becoming a hotspot. Cases in these areas increased a shocking 4.3x – 4.7x — more than doubling the country’s increase.
New cases in rural counties increased 4.3x in the period 6/16-7/13 (129,400) over the period 3/17-4/13 (30,100). This is an absolute increase of 99,300.
Rural and sparsely populated areas account for just 14% of the U.S. population, and yet in June, rural counties accounted for 16.1% of the increase in COVID-19 cases. In the first wave, rural counties were relatively unaffected by the virus, but in this new wave these counties have now caught up with other counties. This is especially concerning, as most rural health systems are ill-equipped to handle the influx of critical patients. In fact, the Pew Center reported last year that 128 rural hospitals have closed since 2010, including a record 18 hospitals last year, and many existing hospitals are underfunded and at risk of closure.
Since each state and city has had a different response to maintaining the coronavirus, we wanted to look at where the biggest hotspots have been and how they’ve shifted between the March surge and the current rise in cases. Using the filtering capabilities in Sisu, we quickly filter the dataset to look at states and understand the change.
Change in sum reflects the relative increase in cases for a specific subpopulation in new cases from March/April, to June/July. Impact reflects the absolute change in new cases between the two periods.
As we can see, in this most recent surge, Florida and Texas have had the greatest number of new cases between the last peak and the current one. They are also two states who had more lax shelter in place restrictions, and who lifted restrictions the earliest. On the other hand, Arizona had the sharpest increase of new cases, with a 19.6x increase from the last surge in March and the current one.
Change in sum reflects the relative increase in cases for a specific subpopulation in new cases from March/April, to June/July. Impact reflects the absolute change in new cases between the two periods.
In contrast, North Eastern and New England States have maintained a low caseload between the first surge of cases in March and this new surge of cases in June. These states were hit hard in March, so their mixture of stern shelter-in-place policies and a cautious approach to reopening could be curbing the spread.
The only state where the idea of “one spike is enough” is not holding true is California. Despite being a hotspot in March, cases in the state increased 6.6x between March and July. We drilled down into California’s data to see why.
While LA county has contributed the most absolute number of new cases between the last surge in March/April and the current one in June/July, we’ve seen significant spikes in the more rural and smaller-population counties like those in the Central Valley and Inland Empire. And like we’ve seen elsewhere in this June surge of cases, counties with a more Latino population than average have been hit particularly hard, with a 7.1x increase in cases.
Change in sum reflects the relative increase in cases for a specific subpopulation in new cases from March/April, to June/July. Impact reflects the absolute change in new cases between the two periods.
We know that this data is only part of the picture. The U.S. is still struggling to make testing widely available, and in June the CDC estimated that the true tally of COVID-19 cases is likely 10 times the number reported. While the CDC is not able to determine if the unreported cases have similar racial and ethnic inequities, they say it is clear that there have been significant disparities in the number of both deaths and cases.
But with the data we do have, it paints a clear picture that the U.S. is nowhere close to putting this pandemic behind us. As every public health expert predicted, we can see a clear correlation between those counties that lifted stay at home orders and mandatory quarantines — or in some cases those who never had mandatory quarantines to begin with — and the counties who saw an exponential increase in new cases in the June surge. In most cases, even as states reopen and relax stay at home orders, there are more cases in every county and city than when states first issued stay-at-home orders in March.
While so much is unprecedented, what’s encouraging to us is the possibility for states and businesses to make informed decisions by marrying these datasets with their own data. And with more data becoming available, it’s more important than ever for journalists and activists to be able to quickly dive into the data to bring light to these inequities and hold decision-makers to account.