As I travel around the country talking to elected officials and government employees one of the most common talking points is the challenge with managing their data. Many of them still do a lot of manual data entry while others struggle with getting accurate indexing data from submissions using electronic recording/filing. This often leads to:
- Inability for Users to Locate Documents
In 2015, the Property Records Industry Association (PRIA) released best practices to address some of these issues. Visit the PRIA Resource Directory for more information.
What is data capture?
Data capture refers to collecting data electronically instead of using manual data entry. This can be accomplished with:
- Receiving XML or Text Files as Electronic Submissions
- Using OCR/ICR Technologies
What obstacles do people face
when using OCR/ICR for data capture?
Poor Quality Images – the lower the quality of the images the less accurate data capture solutions will be. Making sure images are scanned at 300 DPI or higher will help to alleviate most issues. If the images are already scanned at a lower resolution some vendors have a variety of image clean-up tools that can be used to thicken, thin, or smooth characters, deskew or despeckle documents, remove hole punches or borders, and more. The cleaner the documents the better results.
Incorrect Data Captured – if you’ve done everything possible to generate the best quality images and you are still not capturing information correctly there are a couple solutions. Fuzzy text search matches all approximate results for a word, name or number pattern, despite spelling mistakes and number transpositions or unclear OCR results. Secondly, you can use an existing database to validate data to ensure proper results. If all else fails, documents that don’t meet a certain OCR character confidence threshold can be sent for human review.
Changing Indexing standards – over time systems and indexing standards can change. PRIA’s recommendation is to adopt a “key it as you see it” approach. By eliminating data manipulation and translation tables vendors simply return the data exactly the way it is found on the document.
Unstructured Documents – form documents make it easy for solutions to collect the required data because it always falls into the same location. The challenge is finding information in unstructured documents where the desired text can fall anywhere within the document. A rules-based solution doesn’t rely on the information always being in the same location instead identifying key-words or clues to capture text.
Document Classification – capturing data properly can be aided greatly by classifying documents prior to searching for the desired index data. Once we know the document type specific logic can be applied based on the required fields for each unique documents. Machine-learning can be used to train software to recognize document type automatically, and ultimately improve the data capture results.
We suggest our customers use a combination of automated data
capture followed by a human review. Once the data is captured
you need to decide what to do with the information.
We’ll keep that topic for a future discussion.
Interested in learning more? Reach out to us to see our products in action and ask how we can customize our 'rules' to capture discrete data and streamline your workflow.
About the Author: Troy Burke
With 20+ years’ experience providing clients with stellar service and strategic solutions for growth and development, Troy is committed to ensuring his customers receive the highest quality solution, training and support with every implementation. He frequently speaks on the topic of redaction and is actively involved with National Association of Court Management, Property Records Industry Association and several other government organizations. His specialties include: Redaction Software and Services, Automated Data Capture Technology, Identity Theft Prevention and Legislative Compliance.