Leverage OCR to improve your workflows

It’s easy to mistake Optical Character Recognition (OCR) as a one-trick pony. 

After all, pulling text out of an image to make it usable in other applications is an impressive trick.  Don’t be content to think that’s all OCR can do for you though.  By combining OCR output with other technologies, it’s possible to make substantial improvements to workflows throughout an organization.  Incoming document workflows are the first and most obvious place that OCR can make a major impact.

In general, getting documents into your downstream workflows requires a lot of work up front, separating and classifying documents. 

Without automation, the work of separating the incoming documents and making sure they end up in the right place falls to full time employees.  By leveraging OCR and machine learning, it’s now possible to classify documents with stunning levels of accuracy.  In many workflows, the classification and separation can be done completely free of human intervention with a very low margin of error. Classification isn’t the only workflow to benefit from leveraging OCR though, protecting sensitive information is another place it can make a big difference. 

There are many types of information that organizations need to redact from documents, and many reasons the information needs to be protected. 

In the payment card industry, account numbers and other personally identifiable information are important to protect. In health care, anything designated by HIPAA as PHI (private health information) also needs to be protected.  Without OCR, users are left to seek information manually to redact it, and with OCR alone, users still need to use Find tools and hope the OCR data is good enough to find their specific search terms (scan quality can only be so good).  By leveraging OCR data and using a custom rules and application like IDShield, they can automatically propose redactions for a user to review.  In some cases, IDShield can make all the redactions automatically without a user ever even looking at the documents.  In either example, the user workflow is improved and staff have more time to dedicate to other responsibilities. 

These two workflows represent only a piece of what OCR and other technologies can do to improve workflows at your organization.  Document indexing, automatic document annotation, and discrete data extraction are all possible as well.  As technologies like machine learning continue to improve, it’s likely that we’ll continue to see OCR data leveraged more and more to help automate day-to-day tasks.  Next time you’re looking at a manual process that started with a paper document or PDF, ask yourself whether OCR could help automate it. 


Kevin Tschopik works in Professional Services at Extract and has over 10 years of experience deploying and supporting IT systems. He holds a BS in Computer Science and is currently pursuing certification in Project Management. Kevin specializes in workflow planning/management and is involved with customers in every step of the process, from project kickoff to post live support. He resides in Madison WI where he enjoys the vibrant restaurant and craft brewing scene.