Why we do document extraction with artificial intelligence
When we started building a cluster for autonomous accounting tasks 2 years ago, we thought that extracting accounting records was something the industry has already solved. As we have discovered, that is certainly not the case.
Because when we looked at the first solutions in detail, we quickly found that a really good extraction quality can only be achieved in a very cumbersome and tedious way. And that many software solutions are time-consuming to acquire, maintain and also expensive.
One template per document type
An excellent extraction quality can only be achieved with conventional technology if a template is created manually per document. This circumstance unfortunately prevents autonomous processing of documents.
For this reason, we started very early with the development of a machine learning cluster, which can process documents as far as possible without any intervention. For the first few meters of this development, we realized that it was much easier than we initially imagined. However, it became more complex the further we came, as did many complex tasks.
See documents how humans do that
We are fundamentally different than the well-known OCR technologies. Basically, our technology looks at documents like humans.
If, for example, we humans are presented with an invoice that has been issued in a language that is foreign to us, we are able to recognize that it is an invoice at all. We also recognize almost intuitively which are the essential data points.
This is possible because we usually have a lot of experience in dealing with these documents (if you have not worked the whole life as, say, gardeners). In a first step, we visually capture the document and can use the rough scheme to classify the document. We do not need to read the document in detail to know what’s on the document.
Then we look for clues which values belong together on the document and only then do we read the individual values in context to each other.
Our technology basically does exactly that.
Learning from People
The machine learns so well when we help people learn it. In difficult cases, you show where exactly what is and show why it is. This is no different for accounting documents.
So we create reliable, reliable data images from the documents which we manually validate and correct. We use this set of perfect data to train the machine quickly. It is important that the proverbial last comma is correct. We quickly realized that it makes a lot of sense to capture documents with an enormously high level of detail.
Because many people in the machine learning area make the mistake of including a lot of data on the one hand, but not paying enough attention to ensuring that this data is of high quality.
Often one wonders then that the output is qualitatively insufficient.
Solving the extraction of accounting documents is the first significant step to autonomous accounting systems
Why we’re dealing with extracting accounting documents has a simple reason: we do not bother to do it if we want to implement the autonomous accounting engine. The automatic posting of money account transactions and the coordination of this stream with the document data is relatively simple but impossible without document data.
Unfortunately, it is not foreseeable that the electronic invoice formats would take away this step within a reasonable period of time. No format of the future, the line items, so the billing items, mandatory. This means that although we receive the standard information with such a format, the details relevant for accounting are not (yet) to be provided in a structured manner.
It’s time to radically reduce manual work in accounting. That was clear for a long time and it seems like the industry was slowly but surely waking up and working more in that direction.
However, cost and efficiency gains are only the first steps, more importantly, we are laying the foundation for accounting that is incidental and allows for a radically improved database of fundamentally new decision making.
We also see Robo-Accounting, especially in this context. To break new ground in accounting in the long term. For this to become possible, the supposedly trivial, such as the autonomous extraction of data from paper documents, has to be solved first. That’s why we’re here.