Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →
Top 17 data-labeling Open-Source Projects
-
label-studio
Label Studio is a multi-type data labeling and annotation tool with standardized output format
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
cleanlab
The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.
-
refinery
The data scientist's open-source choice to scale, assess and maintain natural language data. Treat training data like a software artifact.
-
compose
A machine learning tool for automated prediction engineering. It allows you to easily structure prediction problems and generate labels for supervised learning. (by alteryx)
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
edsl
Design, conduct and analyze results of AI-powered surveys and experiments. Simulate social science and market research with large numbers of AI agents and LLMs. (by expectedparrot)
-
modzy-labelstudio-sample
Create training data labels from a production model with Modzy, Dropbox, and Label Studio
-
detecting-beer
Определение количества позиций товара на витрине по фотографиям. (label-studio, yolov5, torch, rabbitmq, pika, docker-compose)
-
datalabel
datalabel is a UI-based data editing tool that makes it easy to create labeled text data in a dataframe. With datalabel, you can quickly and effortlessly edit your data without having to write any code. Its intuitive interface makes it ideal for both experienced data professionals and those new to data editing.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
If instead you have a cohort on hand — -i.e., you do not want to send your data to a third party for any reason, or perhaps you have energetic undergrads — -then you could alternatively consider local, open-source annotation such as CVAT and Label Studio. Finally, nowadays, you might instead work with Large Multimodal Models to have them annotate your data; more on this awkward angle later.
Huh?
I wrote my own system for classifying a stream of texts in Python, I might Open Source it one of these days but I have to get it to the point where it is modular enough that I can customize it to do the particular things I want without subjecting people to my whims... I use it every day and I'm not afraid to demo it because it is rock solid.
My understanding is that my system would not be hard to adapt to work on images for certain kinds of tasks.
Pytorch is open source, Huggingface is open source. CUDA isn't. This is
https://labelstud.io/
and for annotating text spans there are so many open source tools
https://github.com/doccano/doccano
I worked for a company a few years back that built annotation tools for projects we sold to customers but never quite got to a polished general purpose annotator. Today there are an overwhelming number of companies in this space and products I never heard of, many of which are cloud based or paid. Looks like a gold rush to me.
Project mention: [Research] Detecting Annotation Errors in Semantic Segmentation Data | /r/MachineLearning | 2023-11-05We have feely open-sourced our new method for improving segmentation data, published a paper on the research behind it, and released a 5-min code tutorial. You can also read more in the blog if you'd like.
Hey HN! I'm super excited to share Markup with you, which is a totally free & open-source annotation tool that helps you transform unstructured text (e.g. news articles) into structured data that you can use for building, training, or fine-tuning ML models!
Check it out: https://github.com/samueldobbie/markup
Project mention: Show HN: Superpipe – optimized LLM pipelines for structured outputs | news.ycombinator.com | 2024-03-26
Project mention: Python package for administering surveys to LLMs | news.ycombinator.com | 2024-04-18
Project mention: Show HN: Demo of using DVC and MLFlow for ML experiments | news.ycombinator.com | 2024-01-29
data-labeling related posts
-
You Can't Have a Free Software AI Stack
-
How we used AI to automate stock sentiment classification
-
German's NLP startup Kern AI has raised €2.7M in seed funding to accelerate its recent growth
-
Why and how we started Kern AI (our seed funding announcement)
-
GPT and BERT: A Comparison of Transformer Architectures
-
Open-source tool to label, assess and maintain natural language data. Treat training data like a software artifact!
-
Drastically decrease the size of your Docker application
-
A note from our sponsor - InfluxDB
www.influxdata.com | 4 May 2024
Index
What are some of the best open-source data-labeling projects? This list will help you:
Project | Stars | |
---|---|---|
1 | label-studio | 16,546 |
2 | doccano | 8,996 |
3 | cleanlab | 8,673 |
4 | awesome-data-labeling | 3,480 |
5 | refinery | 1,365 |
6 | compose | 472 |
7 | bbox-visualizer | 374 |
8 | hover | 314 |
9 | markup | 231 |
10 | awesome-annotation-tools | 180 |
11 | mutate | 149 |
12 | superpipe | 98 |
13 | edsl | 53 |
14 | modzy-labelstudio-sample | 17 |
15 | bunny-party | 10 |
16 | detecting-beer | 10 |
17 | datalabel | 2 |
Sponsored