FormExtractor

Make AI document extraction easy for everyone

This Project Is Live

FormX turns physical documents into structured data automatically. To accommodate for the ever-growing use cases of our customers and differentiate our product, we empowered our users to customize and train their own extractor with no code.

My Role

Led the lean UX process with our PM to validate three of our biggest assumptions. Designed and shipped the first release of this feature.

Project Team

Me, Lead Designer
Frank, Design Manager
Fung, Product Manager
Jason, Senior ML Developer
Ben, Junior Developer

User label the data on their samples so that the custom extractor can learn from

User uploads samples to create the custom extractor

Users can customize what data to extract automatically

We take users step-by-step to create their custom document extractor

Main Challenge

Product Differentiation and Stickiness

Document extractors automate data entry by leveraging AI to extract data from scanned copies of documents. As AI becomes mainstream, more document extractors are in the market. FormExtractor, a late entrant to the market, faces the challenge of

How might FormExtractor differentiate from the competition, at the same time increase existing customer's lifetime value?

 

Discovery

There are not many no-code products to create custom extractors

FormExtractor falls on the no-code side of the document extractor market. Most no-code products, including FormExtractor, only provide templates of documents to extract (e.g. IDs, receipts).

 

We want to differentiate ourselves from a crowded market of no-code, preset extractors

 

FormExtractor falls on the no-code side of the document extractor market. Most no-code products, including FormExtractor, only provide templates of documents to extract (e.g. IDs, reciepts)

 

Our current preset document and information type are not enough for our customers’ nuanced needs

Process

Using Lean Methodology to Test Our Assumptions

We have a small product team with three engineers who were occupied with a large backlog of custom features for our existing enterprise customers. So we had little bandwidth to explore this new feature.

Our PM decided to use the lean methodology to explore the feasibility and product/market fit of this feature. It was up to me to decide how to implement this strategy.

 
 

Starting by listing all the assumptions related to our hypothesis, I needed to identify the fundamental assumption that the hypothesis hinged on. After discussing with our PM and account managers, we landed on customers having many types of data and formats of documents to extract.

 

Two fundamental value assumptions to proof there is product-market fit

Testing our first assumption

Customers need to extract various data from many document types

To test our first hypothesis, I focused on understanding the document format users would upload and the data type they would like to extract. Therefore, we only built a document uploader and a form to collect the types of data the customer wants to extract.

 

235 unique data types were requested, compared to 15 that we previously automatically extract

7 more major types of documents were generally uploaded, in addition to 3 we previously offer

We were excited that there was indeed a need for creating custom extractors because of the diverse range of data and document format that our users would like to extract.

 

Deeper Insight

Uploading & labeling will make or break the experience

Once the value hypothesis has been validated, we focused on making a custom document extractor as self-serve as possible, to alleviate to work on our account managers and engineers.

 
 

Based on interviewing customers who tried our MVP, and our account managers working with enterprise customers, we discovered that upload and labeling steps were the most challenging because:

  1. Customers could seldom set up the label and label the samples by themselves

  2. Customers were suspicious that we were manually inputting data because they did not know how data were labeled and extracted

  3. Customers upload very few high-quality samples that the ML model can learn from

 

Customers have the most challenge in uploading and labelling

Therefore, after validating our fundamental value hypothesis, I turned my attention to making uploading and labelling simple but adequate to create a custom extractor.

 

Improve Uploading & Labelling

Nudging users to upload more high-quality samples

In our MVP, we see that users often only uploaded 1-2 samples, which are not enough to train the ML model for a custom extractor. The quality of some samples was also low, making the extractor inaccurate.

The number of samples is critical to the accuracy of the extractor because it gave the system more examples to learn from. I designed a number of nudges and decided to force the user to upload some samples to start. I also made uploading many samples in batch easy using drag-to-upload. An indicator also showed how many more samples should the user upload to get a more accurate extractor.

 

Users have to upload a few samples to start creating their extractor

Nudging users to upload more samples to create a better extractor

After implementing those nudges, users on average uploaded twice as many samples after this new sample upload flow was implemented.

 
 

Improving quality of samples uploaded

It is also important that the samples uploaded by the users are in high quality. At first, we gave simple instructions for users to look out for when uploading their samples. But then in our proof-of-concept, some samples uploaded by users still was not up to standard.

 

Examples of low quality samples users uploaded that will affect the extractor’s accuracy

 

After discussing with our engineers, it turned out they would process the samples uploaded afterwards to correct for skewness, contrast and accidental crops.

I suggested processing the samples as user upload them. And coupled with a few simple test for contrast and legibility, we could feedback our users about the quality of the samples in real time.

 

Design iterations on the alert when the sample may be low quality

Animation when uploaded samples is processing

Alert when the sample is low quality

Improve Uploading & Labelling

Making labeling easy for everyone

Labeling involves telling the system where is the data located and what does it look like on the sample, usually by drawing a rectangle around where the data is. The precision of labeling would determine the accuracy of the extractor.

Iteration 1: using an open-source labeler

We first embedded an open-source labeling tool onto our product for users to label data. But many of the functions were irrelevant, which confused our users.

 

Iteration 1: Embedding the open-source annotation tool right on our labeling page

 

Iteration 2: Add tutorial to open-source labeler

I then designed tutorial steps that were imposed on the embedded labeling tool to walk the user step-by-step, on how to label one of the data on one of the sample.

 

Iteration 2: Placing tutorial steps to guide users to use the annotation tool

However, that also proved to be too complicated for most users because the open-source labeling tool was built for more sophisticated labeling tasks. But the plethora of features and the granular control were irrelevant for our users.

 

Iteration 3: building our own labeler

I proposed to build our own labeling tool that stripped away most of the complex functions the open-source one has. This was the biggest engineer commitment we would make. But I argued simplified design would reduce the time needed for our account manager and engineers to tutor our users or label data for them.

 

Deciding what tools to include in our in-house annotation tool

Our engineers figured out a way to build our own labeling front-end, and feed the labeling data, such as positions of the rectangles, back to the open-source labeling tool, so that we don't need to build our own backend processor. This was a win-win for both design and engineering.

 

Step 1: Select Detection Region tool

Step 3: Define the field name and type of the data

 

Step 2: Draw the Detection Region around the data to extract

Building our first release

Designing lean and agile

With few dedicated engineering resources for developing this feature, it was my duty to figure out our assumptions, build incremental MVPs to test them, and progressively polish our product for release.

Thanks to our PM, who worked with me to scope the product, and in this lean design process, we were able to test with three iterations of MVPs to test various assumptions before we build our release version.

 

We have developed three MVPs, each validating assumptions and refining key features to define our first release

 

By end of May, we have a clear idea of what our first release would look like. While most of the features in the first release were designed and tested, there is still much polishing needed to be done. Since our development team ran on an agile process, I had to polish my designs in sprints over the two months of development.

It was challenging at first because it was hard to maintain consistency if designs were not done sequentially. Also, there are many changes to the design during the development sprints.

 

Screen designs organized in development sprints, with wireframes for features that are in future sprints

So I drew wireframes for all features yet to be developed, and only create the final design before each sprint after the requirements are finalized to limit major changes. Yet changes were still unavoidable. But at least it was easier to change major flows on wireframes.

 

Outcome and Impact

Design to increase bottom line

FormExtractor became one of the few in the market that empowers users to create their own custom document extractor. Custom extractors become the building block to solve more nuanced and complex extraction tasks without our engineers building custom features on top of our current suite.

 

3.7x

higher LTV for users using custom extractor

45%

Data extraction is done using a custom extractor

 

Thanks for reading!
See all my work

Previous
Previous

Clover Design System

Next
Next

Duolingo For School