FormExtractor

From Proof-of-Concept to MVP, and beyond

Project Brief

FormX is a data extraction tool that turns physical documents into structured data. To accommodate for the ever-growing use cases for our customers and differentiate our product, we developed self-serve model building tool to allow user to create and train their own model for extraction

 
 

Project Team

Me, design
Fung, Product Manager
Jason, Senior ML Developer
Ben, Junior Developer

Platform

Web

Duration

Feb 2021 - Jun 2021

My role

From 0-to-1, I designed three iterations to test our hypothesis and go to market. I focused on the logic of how to create a custom model, train the model with samples and use the custom model across our platform. I designed the user flow, wireframe, and UI design for this feature of our product

Solution Overview.gif

Goal : Making AI accessible for everyone

FormX works with many businesses to automate their processes. Before, we will have to collect document samples from our customers, understand what they need to extract, label all the samples, train the model, and then add this model individually to each customer's account to use.

What if we can let the customer train the model themselves?

 

Faking the experience

As our Proof-of-concept, we want to test if:

  1. Customers will want to create custom models themselves

  2. The needs between customers are nuanced to warrant a custom model builder

  3. Customers are willing to provide samples, some of them confidential, to train the model

We don't need the customer to actually create and train the model to test these assumptions. So we faked the experience by asking them to supply the samples and specify the information to extract. Then our engineers created the models for them.

 
Customer has a wide variety of data they want to extract

Customer has a wide variety of data they want to extract

Many customers are willing to provide their samples, averaging 16 per custom model created

Many customers are willing to provide their samples, averaging 16 per custom model created

Training the model (and our customer)

The most important part of model building is to train the model, which involves adequate data and labeling. From our proof-of-concept, we learned that we must ensure our customer uploads enough high-quality samples, or else the model will be inaccurate.

 
When customer uploads a sample, we process it for skews, uneven exposures, contrast, etc, and notify if the sample is still at low quality after processing

When customer uploads a sample, we process it for skews, uneven exposures, contrast, etc, and notify if the sample is still at low quality after processing

 

During our proof of concept, we asked some of our customers to label their samples using an open-source annotation tool. But most of our customers never labelled data on samples. Therefore we guide them step-by-step to label their samples.

 
In-place tutorial guide for customers to navigate to the right tools

In-place tutorial guide for customers to navigate to the right tools

Image annotation to visualize specific steps for more clarity

Image annotation to visualize specific steps for more clarity

To structure or not to structure

Another insight from our proof-of-concept is that many of customers' requests to extract are structured entries (e.g. tables, lists). Therefore, we added tables in addition to single-field data to help customers extract structured entries.

 
Customer can create label names to extract single-field data

Customer can create label names to extract single-field data

Customer can crate tables and label columns in their samples to extract structured data (e.g. tables, lists)

Customer can crate tables and label columns in their samples to extract structured data (e.g. tables, lists)

Turning Concept Into Product

The custom model extraction feature has gone through three major iterations to slowly address the customer's full range of needs

 
 

After the launch of version 1.0, we continue to work on issues in our backlog and additional features for specific customers’ use cases, most prominently switching our labelling tool from an embedded open-source tool to our in-house annotation tool and image recognition. Both features are still under development.

 
In-house annotation tool, removing unnecessary tools in our use cases

In-house annotation tool, removing unnecessary tools in our use cases

Enable customers to build models that converts images to data (e.g. logos to company names)

Enable customers to build models that converts images to data (e.g. logos to company names)

Outcome and Impact

The custom model builder feature allows FormExtractor to differentiate from its competition and increases the usage of customers who utilize the feature. Custom Models become a building block to solve more nuanced and complex extraction tasks without our engineers building custom features on top of our current suite.

 

3.7x

More API calls than customers not using custom model extraction

3/17

competitor offers custom model building

 

What I learned

Fake it till you really need to make it

When testing a hypothesis, it is important to strip away anything that is not necessary in my test, including functionalities that seem to be the core of the feature. For example, while we are building a custom model building, the feature of building a custom model is actually not necessary to test whether customers will use it.

Design + engineering = more possibility

When tackling the challenge of low quality samples, my first idea as a designer is to give feedback to user when they uploaded low quality samples. But some users told us that these are all the samples they have. After talking to my engineers, they proposed processing the samples to enhance them to an acceptable level and tweak their model to require less samples to produce similar extraction accuracy. Hence reducing user friction to re-upload samples or find new samples. This solution not only require design, but engineering effort to complete.

Balance between sequence and flexibility when working with an agile development team

I design simultaneously with the development sprints. have to adopt their agile workflow. I started wanting the flush out the product from start to end before handing it off. However, the development team cannot wait for me to finish. Instead, I first drew very rough wireframes of the entire feature, and then work on the features per sprint. The rough wireframe gave me and my PM a comprehensive sense of the product to maintain consistency while retaining the flexibility to change the flow of features or the product during the sprints.

 
Previous
Previous

Technocentric School

Next
Next

Clover Design System