Why Are So Many People So Bad at Redaction?

For a company named “Extract,” we sure find ourselves talking about how to get rid of data like a company that would be named “Redact.”  The reason isn’t because we don’t consider ourselves adept at redaction; in fact, it’s something we’ve been doing expertly for more than 20 years.  The difference between our redaction solution and many of the others on the market is that ours is based in the understanding and identification of information in documents. 

This means that our software is using the information it gleans from the data extraction process to complete redactions.  This process (although it’s essentially an instantaneous one) is where Extract’s IDShield software will identify the type of document it’s looking at, flag clues in the text that indicate where sensitive information might be found, and also recognize any potential instance of sensitive information being present.

Our software also learns from its users, analyzing inputs to constantly improve not only our identification of sensitive information, but also the best way to redact it.  For older images, such as when states are removing discriminatory covenants from land records that span centuries, we use a suite of image enhancement tools for despeckling, deskewing, and removing things like scanned hole punches to be able to best identify the text of a document.

AI, image enhancement, document classification, and rulesets sound like a lot and they are.  That’s why automated redaction technology is often best suited to huge historical digitization projects like we do with county land records offices or day-forward contract redaction projects like we do for Pfizer.

A missed or incorrect redaction on a hundred-year-old document that was scanned twenty years ago can probably be forgiven and shouldn’t end up resulting in an identity theft or controversy.  What’s been much more common lately though, are redaction errors on critical documents that have exposed people’s social security numbers or inadvertently left information in legal or official documents.

The state of North Dakota had to remove access to records after it was discovered redaction requirements weren’t implemented consistently.  The Broward County school board released a supposedly redacted document where full details about the Parkland shooter could be read.  This isn’t just a government thing either.  Facebook (now Meta) wasn’t able to redact emails about the company selling your data.  Paul Manafort couldn’t get it right for a legal filing.

So what’s going on here?  And how do we fix it?

The inevitable conclusion of a blog like this might be that the answer is “use our software.”  And by all means, if you’ve got a big redaction need and you’re throwing employees at it and expecting good accuracy, IDShield would be a great fit.

Where the breakdown seems to occur, though, is not in a redaction project that will come with its own sets of policies, procedures, principles, and guidelines, but in the one-offs.  A portion of an email that needs sharing, a submitted legal document, etc.  So for most people who run into trouble, the first issue is frequency.

Redaction isn’t something that these people are doing every day.  It’s difficult to say that Meta isn’t good at what they do, but it’s probably a fair knock to say that they’re not redaction experts.  On top of that, it’s not something they should aspire to be either.  For the most part, redaction isn’t going to be a part of the company’s day-to-day work, but when it is, they need to assess impact.

The reason that botched redactions get so much attention is because they’re often so public.  Before trying to figure out how you’re going to remove sensitive information from a document, it’s helpful to know how important it is.  An individual leaving their birthdate or SSN on a legal document might be able to get away with sloppy redaction or no redaction at all simply because the document will rarely be seen.

If you’re in a high-profile trial, sending information to news outlets, or the information will be poured through online?  Buckle up, because if you’ve made a mistake, someone will find it, and then everyone will find it.  So if the underlying information you’re trying to hide getting exposed would be a big deal, you’ll need knowledge.

Since most people don’t redact often, they’re not familiar with it, which is why we’ve gathered three ways to redact your documents for you to get up to speed.  We also have guides to help you figure out exactly what needs to be redacted.  For something small, you can probably get away with using a consumer software like Adobe Acrobat or even a redaction pen.  For a larger or more complicated project, please reach out so we can jump on the phone or show you a demonstration of our software.

Remember, before you start, assess your frequency, impact, and knowledge and you’ll be on the right track.


About the Author: Chris Mack

Chris is a Marketing Manager at Extract with experience in product development, data analysis, and both traditional and digital marketing. Chris received his bachelor’s degree in English from Bucknell University and has an MBA from the University of Notre Dame. A passionate marketer, Chris strives to make complex ideas more accessible to those around him in a compelling way.