No, Healthcare Isn’t 30% of the World’s Data
The healthcare industry is always buzzing about the next new way to harness the data that it generates. Topics like interoperability and the applications of artificial intelligence are discussed ad nauseum as methods for accessing more data and retrieving insights from it, respectively. The industry is ripe for data analytics projects because it’s reported that healthcare is producing the most data.
The currently accepted figure is that healthcare represents 30% of the world’s data, but it’s difficult to nail down how accurate this figure is versus being an oft-repeated figure that’s become the norm. A wide variety of news articles, blogs, and other sources cite the figure, pointing back to this article published by RBC Capital Markets.
The entire genesis of the statistic appears to be from this single paragraph:
The 30% volume and 36% CAGR are repeated together in tandem across the internet:
The report from RBC doesn’t have a citation for the 30% figure but includes this attribution under an image showing the compound annual growth rates of several industries.
Oddly, the first source listed is just a fairly tame look at the future of patient engagement and contains no references to the volume of healthcare data. The second source, referenced by several articles, links to a 2018 report from IDC examining the rapid increase in digitization and data storage, with some industry breakdowns.
The report does confirm one of the frequently repeated numbers, that the compound annual growth rate for healthcare data was estimated to be 36% through 2025. Given the age of the report and acceleration of technological changes, it’s probably time for this figure to be reexamined.
Regardless, it’s unsurprising and even reasonable that healthcare would be the industry with the largest data growth rate as files have become digitized and advancements in areas like imaging have ballooned data storage requirements. Outdated, but realistic.
But what about the 30%?
The same report, on the very same page, shows how much data eight different industries store while lumping the rest into an ‘Other’ category. Healthcare, the graph shows, had a massive 1,218 exabytes (EB) of data in 2018 (an exabyte is a billion gigabytes), which represented just over seven percent of global data.
Compound growth rates are exponential, and healthcare’s data is the fastest growing category, but seven years of growth isn’t enough for healthcare to reach 30% of existing or even newly created data. Applying IDC’s compound growth figures out to 2025 shows healthcare with a predicted existing data share of around 11%, producing 14% of the world’s new data.
Two other figures for healthcare’s data totals can be found online, both coming from IDC studies. One article cited the 2013-2020 CAGR of healthcare data to be 48% while a more recent study (2021) predicted a total data volume of 10 zettabytes (ZB) by 2025.
Both articles reference IDC studies although the links to the actual studies appear inaccessible in each case. This doesn’t matter much because all three studies arrive at about the same conclusion and are due for a refresh. The 2013 study did a good job predicting healthcare’s data growth through 2018 and the two subsequent reports agreed on a target of about 10ZB for 2025.
Just as the 2013 study would be off the mark for today (had a 48% CAGR continued to 2025, healthcare would have an estimated 17ZB of data), the 2018 and 2021studies are at about the end of their lifespans as well.
The reality is that regardless of what the actual percentage is, healthcare data is growing extremely quickly, at a faster rate than other industries, but aren’t doing anything with it. The reasons range from errors in the data to it being trapped in pdfs or silos. This leads to Microsoft’s assertion that only 3% of healthcare data is being used. Some healthcare executives put the figure at 5%, others agree with 3%, some studies say that hospitals are using 57% of their data if it’s for a business decision.
Again, it’s likely that these numbers don’t reflect reality but the sentiment, that healthcare data is underutilized, rings true, just like with the snowballing data figures.
So how much of the world’s data is healthcare data? Probably between 11 and 14 percent, but what really matters isn’t who has the most data, but that it’s accurate and useful.
Extract offers software to ensure that your incoming documents (think referrals, outside labs, patient access, financials) are automatically classified, indexed, sorted, and converted to usable discrete data delivered to your EMR and/or anywhere else you’d like to inform decision making. If you’re interested in getting more usable data from your incoming documents, please reach out and we’d be happy to show you how we can help.
If you’ve seen more recent figures or think I’ve made an error, you can email me directly here!