Show HN: An annotation tool for ML and NLP

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • pawls

    Software that makes labeling PDFs easy.

  • markup

    A web-based document annotation tool, powered by GPT-4 :rocket: (by samueldobbie)

  • Just to preface this summary, it's all a bit hacked together at the moment, and I'm in the process of rewriting the tool from scratch so this description is privy to change.

    To generate the suggestions there's an active learner with an underlying random forest classifier, that has been fed ~60 seed sentences [1], to classify positive sentences (e.g. contains a prescription) and negative sentences (e.g. doesn't contain a prescription).

    All positive sentences are fed into a sequence-to-sequence RNN model, that has been trained on ~50k synthetic rows of data [2] which maps unstructured sentences (e.g. patient is on pheneturide 250mg twice a day) to a structured output with the desired features (e.g. name: pheneturide; dose: 285; unit: g; frequency: 2). These synthetic sentences were generated with the in-built data generator [3].

    The outputs of the RNN are validated to ensure they meet the expected structure and are valid for the sentence (e.g. the predicted drug name must exist somewhere within the sentence).

    All non-junk predictions are shown to the user who can accept, edit, or reject each. Based on the users' response, the active learner is refined (currently nothing is fed back into the RNN).

    [1] https://github.com/samueldobbie/markup/blob/master/data/text...

    [2] https://raw.githubusercontent.com/samueldobbie/markup/master...

    [3] https://www.getmarkup.com/tools/data-generator/

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • Show HN: An annotation tool for ML and NLP

    2 projects | news.ycombinator.com | 15 May 2023
  • How do I connect application running in a notebook server to my local machine.

    1 project | /r/Kubeflow | 11 Dec 2022
  • Assigning a Port Mapping to a Running Docker Container for MacOS

    1 project | /r/docker | 10 Dec 2022
  • Text Corpus Tagging System

    1 project | /r/django | 18 Oct 2022
  • Ask HN: Any open source text editors with word tagging?

    1 project | news.ycombinator.com | 4 Aug 2022