Launch HN: Encord (YC W21) – Unit testing for computer vision models

Our great sponsors

WorkOS - The modern identity platform for B2B SaaS

InfluxDB - Power Real-Time Data Analytics at Scale

SaaSHub - Software Alternatives and Reviews

Our great sponsors

Encord Active

6 420 9.1 Python

Open source active learning toolkit to find failure modes in your computer vision models, prioritize data to label next, and drive data curation to improve model performance.

Eric and Ulrik from Encord here. We build developer tooling to help computer vision (CV) teams enhance their model-building capabilities. Today we are proud to launch our model and data unit testing toolkit, Encord Active (https://encord.com/active/).
Imagine you're building a device that needs to see and understand the world around it – like a self-driving car or a robot that sorts recycling. To do this, you need a vision model that processes the real world as a sequence of frames and makes decisions based on what it sees.
Bringing such models to production is hard. You can’t just train it once and then it works—you need to constantly test and improve it to make sure it understands the world correctly. For example, you don't want a self-driving car to confuse a stop sign with a billboard, or classify a pedestrian as an unknown object (https://en.wikipedia.org/wiki/Death_of_Elaine_Herzberg).
This is where Encord Active comes in. It's a toolkit that helps developers “unit test”, understand, and debug their vision models. We put “unit test” in quotes because while it isn’t classic software unit testing, the idea is similar: to see which parts of your model are working well and which aren't. Here’s a short video that shows the tool: https://youtu.be/CD7_lw0PZNY?si=MngLE7PwH3s2_VTK
For instance, if you're working on a self-driving car, Encord Active can help you figure out why the car is confusing stop signs with billboards. It lets you dive into the data the model has seen and understand what's going wrong. Maybe the model hasn't seen enough stop signs at night, or maybe it gets confused when the sign is partially blocked by a tree.
Having extensive unit test coverage won’t guarantee that your software (or vision model) is correct, but it helps a lot, and is awesome at catching regressions (i.e. things that work at one point and then stop working later). For example, consider retraining your model with a 25% larger dataset, including examples from a new US state characterized by distinctly different weather conditions (e.g., California vs. Vermont). Intuitively, one might think ‘the more signs, the merrier.’ However, adding new signs can confuse the model, perhaps it’s suddenly biased to rely mostly on surroundings because signs are covered in snow. This can cause the model to regress and fall below your desired performance threshold (e.g., 85% accuracy) for existing test data.
These issues are not easily solvable by making changes to the model architecture or hyperparameter tuning (e.g., adjusting learning rates), especially as the types of problems you are trying to solve by the model get more complex. Rather, they are solved by training or fine-tuning the model on more of "the right" data.
Contrary to purely embeddings-based data exploration and model analytics/evaluation tools that help folks discover surface-level problems without offering suggestions for solving them, Encord Active will give concrete recommendations and actionable steps to solve the identified model and data errors by automatically analyzing your model performance. Specifically, the system detects the weakest and strongest aspects of the data distribution, serving as a guide for where to focus for improving subsequent iterations of your model training. The analysis encompasses various factors: the ‘qualities’ of the images (size, brightness, blurriness), the geometric characteristics of objects and model predictions (aspect ratio, outliers), as well as metadata and class distribution. It correlates these factors with chosen model performance metrics, surfacing low performing subsets for attention, providing you with actionable next steps. One of our early customers, for example, reduced their dataset size by 35% but increased their model’s accuracy (in this case, the mAP score) by 20%, which is a huge improvement in this domain) (https://encord.com/customers/automotus-customer-story/). This is counterintuitive to most people as the thinking is generally “more data = better models”.
If any of these experiences resonate with you, we are eager for you to try out the product and hear your opinions and feedback. We are available to answer any questions you may have!

WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project