Show HN: Programmatic – a REPL for creating labeled data

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
  • skweak

    skweak: A software toolkit for weak supervision applied to NLP tasks

  • Hi Raza here, one of the other co-founders.

    I know that HN likes to nerd out over technical details so thought I’d share a bit more on how we aggregate the noisy labels to clean them up.

    At the moment we use the great Skweak [1] open source library to do this. Skweak uses an HMM to infer the most likely unobserved label given the evidence of the votes from each of the labelling functions.

    This whole strategy of first training a label model and then training a neural net was pioneered by Snorkel. We’ve used this approach for now but we actually think there are big opportunities for improvement.

    We’re working on an end-to-end approach that de-noises the labelling function and trains the model at the same time. So far we’ve seen improvements on the standard benchmarks [2] and are planning to submit to Neurips.

    R

    [1]: Skweak package: https://github.com/NorskRegnesentral/skweak

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts