By Stephanie Kiefer, ECM Solutions Consultant
Document capture: is this a buzz word we are hearing a lot today or what? Sure, it’s an incredible technology, but it’s not smart enough on its own to do what many of my clients expect it to do. Like any implementation, document capture needs a bit of prep work to deliver the desired results.
That’s where my three biggest capture tips for document preparation come into play. It’s important to prepare, format, and clean up your documents in a way that will give your capture application the best chance to succeed.
The way that documents are prepped, organized and separated before they are loaded on to a scanner needs to be well-thought-out (this is a huge step in any capture application). It’s not a reasonable expectation that data will be fully and accurately captured from documents that are spilled on the mailroom floor, gathered up and loaded on the scanner… trust me, this is an expectation I see all the time.
There are a variety of document preparation methods that include separator pages, barcodes with data or ordering of pages. Some of these methods may seem traditional, but the better organized the paper batch is, the better results you’ll get from your capture automations. The amount of document preparation that is required depends on the variety and format of the documents. If the documents are structured and have a known number of pages, less upfront work may be required.
A little side note here is that the above discussion not only applies to paper, but also to electronic files that have more than one document. Advanced capture solutions are able to ingest electronic files and ‘remember’ where those pages came from.
A document’s format is the biggest variable in achieving capture success. It goes without saying (but I’ll say it anyways…) that the more structured and formatted the documents are, the better (and easier) the capture extraction results will be.
In recent years, capture vendors have made unstructured extraction more attainable with advances in cognitive capture. There are tools such as IBM Datacap, for example, that allow for extraction algorithms to be defined. This technology doesn’t care what the document looks like and facilitates the process of extracting data. Most recently, IBM has released Business Automation Content Analyzer that eases this process even more. At the beginning of any capture project, time should be spent analyzing the documents and understanding the technology available. That way, an attainable strategy can be set from the beginning.
Every extraction rule, no matter what technology being used, depends on the results of the document OCR. The cleaner the document, the more confident the OCR will be, and your extraction results will be better.
Every capture tool includes image cleanup or enhancement functions; don’t underestimate the importance of using these tools. Things like straightening, removing speckles, shading or lines, or inverse text correction can greatly improve the OCR results.
Pro Tip: Don’t forget– if you are using barcodes, find them first before removing lines. Yep, I always forget this!
False positives can be a downfall to any capture application, so once you’ve given your OCR engine the best chance to succeed, you’ll also need to check and validate the success. In other words, use the OCR confidence levels and business validation rules to make decisions on how to move forward. The only thing worse than extracting the wrong characters is not realizing the engine has extracted the wrong characters.
Be sure to use an appropriate OCR-required confidence level in conjunction with business validation rules to validate the accuracy of any extracted fields. If you are not positive of the value in a field, let a business user manually verify it. Of course, this also depends on the importance of accuracy at capture time. If you plan on having users access the data once it hits the repository and they can make changes to metadata, it could be a valid business decision to push the validations out as they are. It just depends on your business process.
Don’t get me wrong– there are some pretty slick techniques available with capture technology, and there is nothing wrong with challenging the technology to do more. But, make sure your documents are organized, prepped and cleaned in a way that will give your capture application the best chance to succeed.
To learn more about capture, check out my latest on-demand webinar 5 Common Capture Mistakes and How to Avoid Them and stay tuned for more tips.
ECM Solutions Consultant
About the Author: Stephanie has spent over 20 years working with clients to understand pain points and craft document capture solutions to meet their needs. She loves renovating homes in her free time and is an award-winning engineer who is always up for a challenge.