How Parashift will solve universal intelligent document extraction – a plan in 4 steps
When I started Accounto (accounto.ch) a few years ago, I assumed that for document data extraction there would simply be an API to which I could send any documents. The API would then return me information about what kind of documents I sent and additionally also the most important information that is on the documents. And structured in such a way that a machine can work with it.
“Document extraction is solved, isn’t it?!”
At the time, I had no idea whatsoever about document management. All the more I was astonished that such an API obviously did not exist. Subsequently, I talked to the relevant vendors and received offers and project proposals that, on the one hand, did not meet our budgets, and on the other hand, did not meet our idea of “document extraction is solved” at all. Too expensive, too tedious, not flexible enough.
Discrepancy between demand and industry experts
If I talked about my vision of the document extraction API with business leaders, they all thought it was a brilliant idea. On the other hand, when I spoke to document industry experts, I was told rather quickly that I was a fantasist and that such an API would never be possible under any circumstances. This triggered me to take a closer look at the problem with our team.
What is the technological problem that needs to be solved?
The core of the problem is to be able to offer as many document types as possible “out-of-the-box”. That is, functionally without any efforts. Previous document extraction systems usually use a single set of methods and models per document type to extract the document’s data. The effort per document type is around 2-4 weeks for a 4 person team, depending on the scope and tools of the team. Depending on how you calculate the cost, it is in the range of 15-40k EUR. Costs usually multiply with other dimensions like new geographic regions and different business contexts.
With conventional concepts, there will never be a universal document processing API
My calculation was simple; if I want to develop an API that can be used in half of all countries in the world and if this API should only map the 50 most important document types “out-of-the-box”, this results in a need for 4,800 document types and costs, nota bene, only for the document type processing, of about 170M USD. Maybe I can reduce these costs a little bit, let’s say by 20%, but in the end, it remains a case that does not pay off in any business model.
Another problem is the procurement of the training documents. While I can easily procure large volumes of documents quite cheaply, they are always super one-dimensional. It doesn’t do much good for learning capabilities if I have thousands of copies of very similar documents. Within a document type, you need the broadest possible representation of the possible documents and structures to be read. Such document sets, according to my experience, are de facto (rightly!) not for sale.
The path to universal, intelligent document extraction
So to solve the problem, entirely different concepts and methods need to be implemented. The solution to the enormous cost of elaborating thousands of document types is Parashift’s proprietary “Document Swarm Learning”. Conceptually, the whole thing is actually quite simple: Instead of creating, teaching, and optimizing one model per document type, we tie all learning to the underlying layer, the data point extractors.
This is for the simple reason that many document types share these data point extractors in a purely logical way. For example, a date occurs in many different document types. Instead of always implementing it individually in the context of each document type, we decouple it from the document type and let a set of models learn only for this data point extractor.
The document type itself is then in each case just a collection of data point extractors trained and used together. This has many, massive advantages:
- It reduce the cost of document type creation to a fraction of traditional methods
- It allows customers to create new document use cases on our platform with significantly less effort (time & cost) (instead of doing some “learning”, existing data point extractors are clicked together)
- We thus produce a massive and unique data network in the document industry
“Learn and improve in a swarm”
To reduce training efforts as much as possible, we aggregate learning from all customer clients and all document types in a fully EU-GDPR compliant way and use this learning set to improve all capabilities on the platform. That’s where the name comes from; as the entire swarm of users and machine learning components work together, everyone on the platform, in turn, benefits accordingly.
Automate what can be automated
The third important point is that this Swarm Learning should be as fully automated and permanent as possible. Separate training intervals, which then also keep the user busy, data scientists who manually update any models… All of this is expensive and time-consuming. And prevents rapid development of capabilities.
Business model that enables a data network
To solve the problem of document extraction, there is only one way: the platform has to learn its way through thousands of use cases. On the one hand, this takes time, and on the other, it requires many customers from as many different industries as possible. If we offer customers a platform that allows them to implement use cases as easily as possible and without major hurdles, the chances are high that they will do so often and quickly.
From a theoretical perspective, this is all quite trivial. In reality, as is often the case, the world looks completely different. Each of these three components is per se quite difficult to solve. In addition, in order to enable a rapidly scaling model, we need to have platform capacity ready. It’s completely different to build a system for 30k transactions per day than it is to build a system for 300k transactions per day. And, just because something works as a prototype, unfortunately, often doesn’t necessarily mean it will pay off in production.
A long-term vision and plan in 4 steps
Step 1: Developing the basic technology
In the first step, we worked out the concept and the basic technology to run “Document Swarm Learning” in an automated way. We created countless innovations, not quite all of which we have in today’s product. Also, other components were created, quasi-incidentally, which do not have so much to do with our mission. One example is a system that uses machine learning to assign bookings to business cases and generate booking records in large companies with hundreds of bank accounts. Or a component that turns machine-generated documents into poor-quality documents (I know that sounds absurd, but it is actually very helpful).
Step 2: Working out platform and ecosystem, enabling revenue streams
In the second step, we put together Swarm Learning, automation, and the business model into Parashift as we currently know it. The offering focuses on:
- Large customers
- Integrators and software vendors
We rely on partner companies as multipliers in sales and integration. This enables us to scale the actual subscriptions at a very high pace, considering this type of product.
Within a short time, mostly in beta stage, we were able to win many customers and generate significant recurring revenues. This is not yet a big hit with customers, but in most tenders and evaluations we win comparatively easily with this first product. Equally against long-established system providers as well as against many vertically oriented start-ups that are currently emerging.
With this ecosystem of product, technology, partners, and customers, we can quickly learn the necessary document cases and offer more and more standard document types “out-of-the-box”. At the moment, we have worked out around 350 of them, of which we are now continuously publishing some on the platform. In the near future, we will be the first and only system on the market to offer more than 500 standard document types without the need for configuration or training via a single API.
The key point of this phase is commercialization. Our data shows that while we need to continue to invest significantly in R&D and platform development, at the same time we can generate significant business with the business model. We need to do so in order to remain comfortably “bankable.” This phase alone has the potential to make Parashift a globally successful company. That’s what we’re focusing on right now, and we’re trying not to worry about problems that will arise in the future and in the future development.
What comes now are the next strategic steps, which have no operational/commercial relevance in the current going of the company.
Step 3: Launching an SME offering
The next logical step is to launch an SME offering. This will give us another huge boost of additional learning cases to push us forward. The exact business model for this product is still largely in the dark. Thilo Rossa, our Chief Product Officer, and I have various ideas on this. What is clear, however, is that it has to happen in order to move from 1,500 standard document types in the direction of 25,000 and beyond.
Step 4: Scaling the Universal Document Processing API
Once the 25,000 standard document types are reached, there are virtually no serious alternatives to using the Parashift API for most document automation cases. This also has to do with the fact that it will be possible to keep the pricing for using the API just above the actual transaction costs. I expect that we will be able to reduce the (total) cost per transaction (document) by more than 90%. This means that what triggered me to start the journey at the very beginning is actually becoming a reality.
From product to infrastructure company
Parashift is constantly transforming itself on this journey. The first two years have been about developing the concept, the technology, and the strategy. It’s not like we had the whole master plan already in place from the beginning. On the contrary, we often had to accept setbacks and pay dearly. Step by step, we have worked out the solution to the challenge. Sooner or later, we will become an infrastructure company. Something like a mixture of Stripe and DeepL. Simply for the document industry.