Being able to transform a PDF or image into a document is an incredibly powerful tool all by itself. As powerful as it is though, leveraging the text on those new documents is where the real power is. While OCR on its own allows for an individual to have searchable and copy/paste access to the text within an image, the possibility exists to do so much more with advanced OCR.
For example, OCR data can be used to quickly and accurately sort thousands of documents per day. This is accomplished by applying machine learning algorithms to the OCR output in order to determine what ‘type’ of document any particular image or PDF is. The benefit here is almost immediately obvious. In a workflow where incoming faxes are a key business process, advanced OCR can be used to automatically sort documents into their downstream workflows. That’s only one example of what can be done by leveraging OCR data.
To take this one step further, advanced OCR can be used to then read the OCR data within the document. Constructing rulesets allows the advanced OCR to pinpoint standard fields within the document. For example, if you are working with a stack of credit card statements, you may find the following as standard fields:
- Account Number
- Account Owner
- Statement Balance
- Statement Date
- Last Payment Date
Advanced OCR rulesets can find these fields and index this information into spreadsheets or route the information to any destination desired for easy, trendable reporting.
Similar processes can be used to extract or redact key pieces of information from documents. Say you have a large number of documents that need to have personally identifying information redacted before the documents can be released to the public. There are tools available that leverage OCR to automatically find and redact that sensitive information. Extracting the data from documents quickly is another powerful way to take advantage of OCR information. Manual data entry is time consuming and tedious… but by leveraging OCR it’s possible to automate large portions of the data entry process.
These examples are achieved by applying custom rules and business processes to the data OCR provides. This essentially unlocks data that OCR provides and allows you to get more value from it. Raw OCR output is powerful and can be used to great effect but it will always be locked in the document. Unlocking the potential of OCR requires a look into how the data can be harnessed to improve productivity and business processes.
About the Author: Kevin Tschopik
Kevin Tschopik works in Professional Services at Extract and has over 10 years of experience deploying and supporting IT systems. He holds a BS in Computer Science and is currently pursuing certification in Project Management. Kevin specializes in workflow planning/management and is involved with customers in every step of the process, from project kickoff to post live support. He resides in Madison WI where he enjoys the vibrant restaurant and craft brewing scene.