data-labeling

Top 17 data-labeling Open-Source Projects

  • label-studio

    Label Studio is a multi-type data labeling and annotation tool with standardized output format

  • Project mention: Annotation is dead | dev.to | 2024-04-26

    If instead you have a cohort on hand — -i.e., you do not want to send your data to a third party for any reason, or perhaps you have energetic undergrads — -then you could alternatively consider local, open-source annotation such as CVAT and Label Studio. Finally, nowadays, you might instead work with Large Multimodal Models to have them annotate your data; more on this awkward angle later.

  • doccano

    Open source annotation tool for machine learning practitioners.

  • Project mention: You Can't Have a Free Software AI Stack | news.ycombinator.com | 2023-07-13

    Huh?

    I wrote my own system for classifying a stream of texts in Python, I might Open Source it one of these days but I have to get it to the point where it is modular enough that I can customize it to do the particular things I want without subjecting people to my whims... I use it every day and I'm not afraid to demo it because it is rock solid.

    My understanding is that my system would not be hard to adapt to work on images for certain kinds of tasks.

    Pytorch is open source, Huggingface is open source. CUDA isn't. This is

    https://labelstud.io/

    and for annotating text spans there are so many open source tools

    https://github.com/doccano/doccano

    I worked for a company a few years back that built annotation tools for projects we sold to customers but never quite got to a polished general purpose annotator. Today there are an overwhelming number of companies in this space and products I never heard of, many of which are cloud based or paid. Looks like a gold rush to me.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • cleanlab

    The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.

  • Project mention: [Research] Detecting Annotation Errors in Semantic Segmentation Data | /r/MachineLearning | 2023-11-05

    We have feely open-sourced our new method for improving segmentation data, published a paper on the research behind it, and released a 5-min code tutorial. You can also read more in the blog if you'd like.

  • awesome-data-labeling

    A curated list of awesome data labeling tools

  • refinery

    The data scientist's open-source choice to scale, assess and maintain natural language data. Treat training data like a software artifact.

  • compose

    A machine learning tool for automated prediction engineering. It allows you to easily structure prediction problems and generate labels for supervised learning. (by alteryx)

  • bbox-visualizer

    Make drawing and labeling bounding boxes easy as cake

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  • hover

    :speedboat: Label data at scale. Fun and precision included. (by phurwicz)

  • markup

    A web-based document annotation tool, powered by GPT-4 :rocket: (by samueldobbie)

  • Project mention: Show HN: An annotation tool for ML and NLP | news.ycombinator.com | 2023-05-15

    Hey HN! I'm super excited to share Markup with you, which is a totally free & open-source annotation tool that helps you transform unstructured text (e.g. news articles) into structured data that you can use for building, training, or fine-tuning ML models!

    Check it out: https://github.com/samueldobbie/markup

  • awesome-annotation-tools

    A curated list of awesome data annotation tools

  • mutate

    A library to synthesize text datasets using Large Language Models (LLM)

  • superpipe

    Superpipe - optimized LLM pipelines for structured data

  • Project mention: Show HN: Superpipe – optimized LLM pipelines for structured outputs | news.ycombinator.com | 2024-03-26
  • edsl

    Design, conduct and analyze results of AI-powered surveys and experiments. Simulate social science and market research with large numbers of AI agents and LLMs. (by expectedparrot)

  • Project mention: Python package for administering surveys to LLMs | news.ycombinator.com | 2024-04-18
  • modzy-labelstudio-sample

    Create training data labels from a production model with Modzy, Dropbox, and Label Studio

  • bunny-party

    A demonstration of how DVC and MLFlow can be used in the task of data relabeling

  • Project mention: Show HN: Demo of using DVC and MLFlow for ML experiments | news.ycombinator.com | 2024-01-29
  • detecting-beer

    Определение количества позиций товара на витрине по фотографиям. (label-studio, yolov5, torch, rabbitmq, pika, docker-compose)

  • datalabel

    datalabel is a UI-based data editing tool that makes it easy to create labeled text data in a dataframe. With datalabel, you can quickly and effortlessly edit your data without having to write any code. Its intuitive interface makes it ideal for both experienced data professionals and those new to data editing.

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

data-labeling related posts

  • You Can't Have a Free Software AI Stack

    2 projects | news.ycombinator.com | 13 Jul 2023
  • How we used AI to automate stock sentiment classification

    1 project | dev.to | 21 Feb 2023
  • German's NLP startup Kern AI has raised €2.7M in seed funding to accelerate its recent growth

    1 project | /r/u_technicalbeep | 16 Feb 2023
  • Why and how we started Kern AI (our seed funding announcement)

    1 project | dev.to | 16 Feb 2023
  • GPT and BERT: A Comparison of Transformer Architectures

    2 projects | dev.to | 9 Feb 2023
  • Open-source tool to label, assess and maintain natural language data. Treat training data like a software artifact!

    1 project | /r/LanguageTechnology | 25 Jan 2023
  • Drastically decrease the size of your Docker application

    2 projects | dev.to | 3 Jan 2023
  • A note from our sponsor - InfluxDB
    www.influxdata.com | 4 May 2024
    Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →

Index

What are some of the best open-source data-labeling projects? This list will help you:

Project Stars
1 label-studio 16,546
2 doccano 8,996
3 cleanlab 8,673
4 awesome-data-labeling 3,480
5 refinery 1,365
6 compose 472
7 bbox-visualizer 374
8 hover 314
9 markup 231
10 awesome-annotation-tools 180
11 mutate 149
12 superpipe 98
13 edsl 53
14 modzy-labelstudio-sample 17
15 bunny-party 10
16 detecting-beer 10
17 datalabel 2

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com