Does master data handling belong in a document extraction platform?

In the past, it was common for master data to be integrated into an extraction platform. However, this has some disadvantages that can be avoided with an intelligent solution. Modern OCR solutions can recognize whether a number is a VAT number, postal code or IBAN, for example. As a helpful analogy, a common pedestrian on the street can be taken and an invoice can be given to him for interpretation. Assuming he can read, he is very likely to be able to say what the address of the supplier, his IBAN and postal code is. Conventional OCR solutions, which work with a master data matching, could not do this. What the disadvantages are of these solutions and why master data handling does not belong in an extraction platform is explained below in more detail.

How the classical approach works
Data is recognized by an OCR solution that is mostly designed for documents from e-mails, scanned documents and PDFs and then forwarded to the ERP system or DMS system, depending on where the digital workflow is implemented. After the data has been released by the workflow, it is finally archived by the DMS.
The classic OCR systems often need master data to find the supplier in this process. They analyze all text modules of a document and compare each of them against the master data respectively see if there is a match. Depending on the match, the match is classified as either high or low. For example, if the VAT number and the postal code of the scanned document match the master data of a supplier in the database, the match will be classified as high.
So the master data does not classically belong in the OCR system because it makes sense there, but because otherwise the systems would not be able to properly recognize and extract supplier and sender data. The classical method with the master data matching is therefore also by no means intelligent, it only compares cleverly. While it works relatively well and is more or less reliable, it also has its weak points.
Disadvantages of OCR solutions with integrated master data handling
Classic OCR solutions work smoothly as long as the master data is well maintained and constantly updated. But even with perfect maintenance of master data, problems can occur.
If an invoice comes from a supplier that has not yet been entered in the master data, matching will not work. This means that the system will either not find a matching result or will output an incorrect result.
Changes to the master data can also lead to complications. If the supplier has changed his address and this has not been updated in the master data, for example, it is very likely that no match can be found.
As far as matching individual item data is concerned, the respective item numbers that are on the invoice must be entered into the system. However, if the article has a different article number internally, the vendor’s article number must be stored with the internal article number. If the vendor now changes his article numbers, the system does not recognize that these are the old stored article numbers and will not find a match. As you can see, the whole thing is extremely maintenance-intensive.
What makes intelligent OCR solutions stand out
Back to the analogy we made at the beginning. A complete stranger may be able to identify on an invoice what the address is, but he does not know whether the supplier is admissible or whether he is actually based according to the address listed. Modern, intelligent OCR solutions can recognize documents in the same way that people on the street do. The subsequent interpretation of whether the data is good or bad cannot be done by a complete stranger. The Parashift Platform cannot do this either. This is because the business logic and rules are not integrated into the OCR solution as they are not necessary but would be the case with traditional OCR. Even without stored logic and business rules, our platform recognizes whether a number is an IBAN or a VAT number, for example, and outputs the data in the appropriate structured format. In other words, Parashift actively searches for the address and interprets it.
parashift.io/template-based-ocr-versus-machine-learning-based-ocr/In the case of invoices from new suppliers, everything from the address and the individual item data to the VAT number can be recognized and reliably readout without any master data. In other words, machine learning-based OCR solutions also provide usable metadata when the address or IBAN changes. Consequently, there are no problems with matching when master data changes or invoices from new suppliers are available. And this is exactly why master data does not belong in an OCR software. It is simply not needed and only increases transaction costs.
If you use classic OCR solutions with master data matching yourself and you are tired of making so much extra effort, then register for a 14-day test account via the banner below and convince yourself of the effectiveness of modern extraction solutions.