FormExtractor
Make AI document extraction easy for everyone
Problem
FormExtractor has to stand out from the crowded, no-code document extractor market. Our customers want to extract complex data from many more types of documents.
Solution
By empowering everyone to build their custom extractors, we have scaled our product to address any extraction use case, and put ourselves into a unique position in the market.
Outcome
3.7x the lifetime value of customers building custom extractors, and cut down 70% of customer support time.
My Role
Led the lean UX process with our PM to validate three of our biggest assumptions. Designed and shipped the first release of this feature.
Project Team
Me, Lead Designer
Frank, Design Manager
Fung, Product Manager
Jason, Senior ML Developer
Ben, Junior Developer
Challenge and Research
How to create product differentiation & stickiness
FormExtractor falls on the no-code side of the document extractor market which is crowded by many receipt and invoice scanners.
While we support more types of documents, we hear many times from our large enterprise customers and prospective customers that they want to extract documents of their own formats. Because of this, we lost a HK$ 3M opportunity automating the application and claim system of the largest insurance company in Hong Kong.
Early Explorations
HMW help customers extract from any document?
We looked at a few existing text recognition methods out there. And based o our engineering resources and after some discussions, our product team narrowed down to two diverging approaches: provide templates or create their custom extractors.
We were excited that there was indeed a need for creating custom extractors because of the diverse range of data and document format that our users would like to extract.
Create templates
Can immediately respond to customer needs
Easy to build templates for each document type
No refactoring the backend is needed
Custom extractor builder
Build it once and can scale infinitely to customers’ needs
New processes and refactoring are needed
Can become one of our UVPs
Build Proof of Concept to validate the direction
I understand that our engineers do not want to undertake such a huge engineering challenge of creating a new tool. But creating a custom extraction builder would help us scale our offering, and free up engineers time from customizing models for our customers so that they can focus on product innovation.
To prove that a custom extractor builder is a more scalable and time-saving direction, I asked our engineers to help me build a simple proof of concept to see how many different types of documents our customers want to extract from.
We got it shipped in 2 weeks to some existing customers. And within 1 week, they have requested to extract from many documents and data types. That is enough to prove to the team that creating templates for every document is just not practical, and a builder would help us scale our service to more customers.
PoC Result After 1 Week
34
Custom Request
10
Document Types
234
Data Types To Extract
Research and Deeper Insight
Customers need a sense of control in the extraction
Once the direction is set, I went back to the customers who tried our PoC and asked about their experiences. Surprisingly, some customers didn’t want to use it anymore.
Some of them thought that we were manually inputting data for them, and did not believe our system can handle their daily document extraction load if it’s manual. Others just did not get a very accurate extraction.
Customer Interview Stats
5
PoC Customers
8
Interview Sessions
Customers did not trust our system
To me, this revealed a deeper issue, which is that our customers did not trust our system to extract information reliably because 1) they are confused about how the extractor extracts data, and 2) the data is not accurate.
I identified uploading and labeling to be the two key steps that will foster trust in our system. It is about getting users to upload more, high quality samples for our system to learn from, and creating a sense to our users that they are instructing the system to extract data.
Improve Uploading & Labelling
Nudging users to upload more high-quality samples
In our MVP, we see that users often only uploaded 1-2 samples, which are not enough to train the ML model for a custom extractor. The quality of some samples was also low, making the extractor inaccurate.
The number of samples is critical to the accuracy of the extractor because it gave the system more examples to learn from. I designed a number of nudges and decided to force the user to upload some samples to start. I also made uploading many samples in batch easy using drag-to-upload. An indicator also showed how many more samples should the user upload to get a more accurate extractor.
After implementing those nudges, users on average uploaded twice as many samples after this new sample upload flow was implemented.
Improving quality of samples uploaded
It is also important that the samples uploaded by the users are in high quality. At first, we gave simple instructions for users to look out for when uploading their samples. But then in our proof-of-concept, some samples uploaded by users still was not up to standard.
After discussing with our engineers, it turned out they would process the samples uploaded afterwards to correct for skewness, contrast and accidental crops.
I suggested processing the samples as user upload them. And coupled with a few simple test for contrast and legibility, we could feedback our users about the quality of the samples in real time.
Improve Uploading & Labelling
Making labeling easy for everyone
Labeling involves telling the system where is the data located and what does it look like on the sample, usually by drawing a rectangle around where the data is. The precision of labeling would determine the accuracy of the extractor.
Iteration 1: using an open-source labeler
We first embedded an open-source labeling tool onto our product for users to label data. But many of the functions were irrelevant, which confused our users.
Iteration 2: Add tutorial to open-source labeler
I then designed tutorial steps that were imposed on the embedded labeling tool to walk the user step-by-step, on how to label one of the data on one of the sample.
However, that also proved to be too complicated for most users because the open-source labeling tool was built for more sophisticated labeling tasks. But the plethora of features and the granular control were irrelevant for our users.
Iteration 3: building our own labeler
I proposed to build our own labeling tool that stripped away most of the complex functions the open-source one has. This was the biggest engineer commitment we would make. But I argued simplified design would reduce the time needed for our account manager and engineers to tutor our users or label data for them.
Our engineers figured out a way to build our own labeling front-end, and feed the labeling data, such as positions of the rectangles, back to the open-source labeling tool, so that we don't need to build our own backend processor. This was a win-win for both design and engineering.
Building our first release
Designing lean and agile
With few dedicated engineering resources for developing this feature, it was my duty to figure out our assumptions, build incremental MVPs to test them, and progressively polish our product for release.
Thanks to our PM, who worked with me to scope the product, and in this lean design process, we were able to test with three iterations of MVPs to test various assumptions before we build our release version.
By end of May, we have a clear idea of what our first release would look like. While most of the features in the first release were designed and tested, there is still much polishing needed to be done. Since our development team ran on an agile process, I had to polish my designs in sprints over the two months of development.
It was challenging at first because it was hard to maintain consistency if designs were not done sequentially. Also, there are many changes to the design during the development sprints.
So I drew wireframes for all features yet to be developed, and only create the final design before each sprint after the requirements are finalized to limit major changes. Yet changes were still unavoidable. But at least it was easier to change major flows on wireframes.
Outcome and Impact
Design to increase bottom line
FormExtractor became one of the few in the market that empowers users to create their own custom document extractor. Custom extractors become the building block to solve more nuanced and complex extraction tasks without our engineers building custom features on top of our current suite.
3.7x
higher LTV for customers with access to custom extractor
65%
Data extraction is done using a custom extractor