Government

Top 3 Optical Character Recognition (OCR) Misconceptions

September 12, 2017

In previous blog articles, you may have come across learning a bit more about what OCR is and how your organization may benefit from using an OCR engine. Optical Character Recognition can be an extremely powerful tool, but there are many things that an OCR engine can’t actually handle, that often times get overlooked. Below I have listed out the top 3 most common misconceptions of an OCR engine.

1. OCR Is Hardware…NOT!

OCR is actually, software. Many times, hardware is required, but the OCR process is actually software. The most common hardware used with OCR is a scanner or multi-function device. Physical paper needs to be converted into an electronic file for OCR software to do its job. And higher-quality hardware can give you better OCR results, but the process of OCR’ing a document is performed by software.

2. OCR Is a Document Management Solution (DMS)…NOT!

OCR is just a part of an overall document management solution, but can play an important role. Typically, OCR software is installed in your DMS as a plugin which converts every non-searchable PDF file, into a searchable format so that the text inside the file can be read by the DMS.

3. OCR Is 100% Accurate…NOT!

This is probably the most common misconception. Even under ideal conditions where the paper documents are typewritten with flawless form data; there still could be errors during the process. It is almost mandatory to have a manual review step after OCR processing to verify the results and correct any mistakes. Also, many times it is not necessary to OCR the entire document. Many times, simply capturing an ID number or a few key pieces of information can be sufficient to “tag” the document.

OCR engines can do a lot of the leg work to reduce time it takes organizations to process documents, but there are still a lot of tasks that a standalone OCR engine can’t accomplish by itself. Some of those tasks include; automating document handling, indexing, extracting, or redacting data. The good news is, there are third-party platforms out there that can integrate with an OCR engine, as well as a DMS/CMS, to deduce even more document processing time, and can eliminate manual data entry as well. Just think about how much time your organization would save by allowing software to do most of the leg work.

Is Extract an OCR engine?

Extract is NOT an OCR engine. Our software uses a well-respected third-party OCR engine to obtain the text and spatial data from a given document. Sitting on top of that is our extensive library of document processing and reading algorithms that facilitate reading the document like a human would. We use machine learning, pattern matching, spatial recognition, and many more methods to pull the important information that you need from the document, regardless of its structure.

Want to learn about other misconceptions? Read some other Extract FAQs

Meet The Author

Tera Madigan

Speak to a solution consultant

Request A Demo