Since redactions have gone digital, there have been an increasing number of ways that they can fail, giving bad actors access to data that was intended to be obscured. The high-profile cases that have ended up in the news in recent years have tended to include the most basic errors.

People will draw a black box over text or change the text’s color, leaving the information vulnerable to the simplest of hacks, copying and pasting the text into a new document. As basic as that sounds, it’s happened to law firms and political campaigns, usually occurring when documents are released hastily. The redaction failures that reach the news are just the ones that have been uncovered, as it’s hard to tell just how many poor redactions exist until they are tested.

Researchers at the University of Illinois set out to get a better handle on issues like this, including just how vulnerable some popular redaction methods and tools are. Wired’s summary of the research puts it bluntly, “The flaws aren’t just theoretical. After examining millions of publicly available documents with blacked-out redactions—including from the US court system, the US Office of the Inspector General, and Freedom of Information Act requests—the researchers found thousands of documents that exposed people’s names and other sensitive details.” The research unearthed hundreds of court documents with redactions that could be cracked with a simple copy and paste.

The report referenced guidelines that the NSA has published as the current set of best practices for redaction. There are several helpful tips included like changing the underlying text that you’re redacting to different characters, obscuring metadata, and how to handle images with sensitive information.

All the tips and the discussion so far has been related to being able to actually uncover the text contained within the document, but the University of Illinois researchers were also able to break through redactions in a similar way to how a hacker might be able to obtain your password, by brute forcing thousands of combinations. This would be a near-impossible task without additional information, but when trying to uncover a redacted last name, the team was able to eliminate 80,000 guesses per second just from the size of a redaction.

While the size of a redaction doesn’t necessarily reveal specifics like the number of characters, it significantly reduces the possibilities of what could be obscured. If a redaction is about this size [|||||||||] there are only so many potential words that could be hidden. The University of Illinois model takes into account things like the font and size of characters, as the prior redaction could just as easily be the 6-letter word “lilies,” as it could be the 3-letter “maw.”

Now this technique for deciphering what’s been redacted certainly isn’t a skeleton key for unlocking redactions, but the figures are alarming, particularly since these results can be obtained using generally prudent redaction techniques:

“We found, for example, that redacting a surname from a PDF generated by Microsoft Word set using 10-point Calibri leaves enough residual information to uniquely identify the name in 14 percent of all cases.”

When redacting individual documents, this means that changing the underlying text to something like ‘XXXXX’ is crucial as it’s also obscuring character sizes. It does mean, though, that you’ll have to ensure that your text changes aren’t breaking the formatting and layout of your document.

If you’re using a professional redaction vendor like Extract, you can be assured that metadata is being erased and that an unremovable redaction is in place. Our ID Shield software in particular, can extend redactions to surround the context for specific data types to eliminate the ability to guess what’s underneath.

For small redaction projects, we’d highly encourage you to keep in mind everything we’ve talked about here and for larger scale projects, we’d be happy to show you how our automation platform works.

About the Author: Chris Mack

Chris is a Marketing Manager at Extract with experience in product development, data analysis, and both traditional and digital marketing. Chris received his bachelor’s degree in English from Bucknell University and has an MBA from the University of Notre Dame. A passionate marketer, Chris strives to make complex ideas more accessible to those around him in a compelling way.

Are Redactions Safe?

Online Records Access

Redaction in action

Extract blogs

Extract Platforms

Site map