How we went from capturing invoices to processing any kind of document
“We help you automate your accounting.” – In the beginning, Parashift was all about accounting, and thus to a large extent about processing invoices. With a few machine learning (ML) models, good results could already be achieved. However, additional customer requirements eventually led to a fundamentally different, innovative technological approach.
In the beginning, the capabilities of the Parashift Platform were different than they are today
In order to have a universal, horizontal solution that could handle any type of document out-of-the-box, a powerful solution had to be built.
More data fields, please! More document types, please!
Initially, the entire engine was driven by three individual ML models for the following predictions:
- ML model 1: Prediction of all header data, i.e. invoice and order number, invoice and delivery date, and total amount
- ML model 2: Prediction of line items
- ML model 3: Prediction for the sender who sent the invoice
For a quick start, the three ML models delivered good results. However, customers’ requirements for Parashift’s solution gradually became more extensive, including the following problems and requests from customers:
- “We need to extract additional data fields, we also need fields X and Y.”
- “We need to extract such and such reference.”
- “We want to extract more specific line items.”
- “We want to extract other document types besides invoices, such as delivery bills and purchase orders.”
- “We need the exact data of document types like delivery bills and purchase orders to make our invoice process work.“
The interest in the principle of “throwing any type of document at a cloud platform and extracting the information from it” was immense.
Big challenge to build a powerful solution
Parashift faced the big challenge of building the right solution. The problem here was that if the same principle had been followed as with the three ML models for each new document type, this would not only have meant building three new ML models for all predictions. At the same time, the training of data fields would also have had to start from scratch. And this for each new document type. The effort would have been enormous and the endeavor would never have been scalable.
ML models not by document type, but by data field
Parashift came up with the idea of sharing and overlaying as much data as possible (but never the actual data, of course) to create and train ML models not by document type, but rather by field unit. Swarm learning technology was born.
With this breakthrough methodology, an invoice date, for example, can be trained and then reused in any other, arbitrary document type. So, for example, if a supplier receives a letter from a customer complaining about their invoice, they can simply use the pre-trained field and automatically extract the invoice number from the letter.
A universal, horizontal solution for all document types
Thanks to Swarm Learning and the powerful, AI technologies behind it, new document types can be started quickly, document types can be changed easily and new fields can be added. Quite simply and with just a few clicks. And not just with invoices, as was the case at the beginning, but with any type of document.