How De-Identified Is Your Healthcare Data?

As a company that automates data and document entry into an EMR, we see just how much data a patient can generate.  Multiplied by the size of an entire hospital or healthcare network creates a truly massive dataset.  At such large volumes, this data becomes extremely valuable, whether it’s for public health purposes, clinical trial identification, or for less savory purposes like marketing.  Either way, it’s big money; you may remember that we previously wrote that Cerner believes they have a billion dollar business in data.

Normally, this is the type of information that would fall under HIPAA, with the privacy legislation dictating how the data can be used.  The regulation no longer applies, though, once the information has been de-identified or redacted.

The problem isn’t necessarily with the data per se, because on its own, there shouldn’t be enough information to identify anyone.  It’s when the information is combined with another source that it’s easier to start reconstructing identities.  A study in 2019 showed that 99.98% of Americans can be re-identified using an incomplete dataset of 15 attributes and all the way back in 2000, it was known that 87% of the population can be identified with just a zip code, gender, and date of birth. (https://www.forbes.com/sites/forbestechcouncil/2019/08/27/medical-data-de-identification-is-under-attack/)

While organizations will differ in the amount of data they have about you, there are unique ways to use all sorts of data to learn about someone.  As an example, think about your location data.  By seeing how someone travels throughout the week, it’s easy to identify key information about them, like their home and place of employment.  In some cases, these locations don’t even need to be gleaned from the data, but are offered up by users.

Recently, two doctors with experience in digital health information published an article in The New England Journal of Medicine with their suggestions as to how to ensure patient privacy.  Their recommendations are to:

  • Even if data is de-identified, treat it the same as information protected by HIPAA

  • Enact contractual controls with third parties regarding data sharing limitations

  • Allow outside parties to analyze data while it’s still at the healthcare institution rather than sending it to them

  • Urge lawmakers to enact more expansive data protection laws

It’s important to act on recommendations like these now, as the amount of data produced, collected, and stored on individuals increases exponentially.  And while new protections are important, it’s also key to use best practices when de-identifying or redacting the information we have now.  This means having a keen eye on the details or by using an automated software like Extract’s that can do that for you.


About the Author: Chris Mack

Chris is a Marketing Manager at Extract with experience in product development, data analysis, and both traditional and digital marketing. Chris received his bachelor’s degree in English from Bucknell University and has an MBA from the University of Notre Dame. A passionate marketer, Chris strives to make complex ideas more accessible to those around him in a compelling way.