SaaSHub helps you find the best software and product alternatives Learn more →
Top 11 Python data-labeling Projects
-
cleanlab
The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
refinery
The data scientist's open-source choice to scale, assess and maintain natural language data. Treat training data like a software artifact.
-
compose
A machine learning tool for automated prediction engineering. It allows you to easily structure prediction problems and generate labels for supervised learning. (by alteryx)
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
edsl
Design, conduct and analyze results of AI-powered surveys and experiments. Simulate social science and market research with large numbers of AI agents and LLMs. (by expectedparrot)
-
modzy-labelstudio-sample
Create training data labels from a production model with Modzy, Dropbox, and Label Studio
Huh?
I wrote my own system for classifying a stream of texts in Python, I might Open Source it one of these days but I have to get it to the point where it is modular enough that I can customize it to do the particular things I want without subjecting people to my whims... I use it every day and I'm not afraid to demo it because it is rock solid.
My understanding is that my system would not be hard to adapt to work on images for certain kinds of tasks.
Pytorch is open source, Huggingface is open source. CUDA isn't. This is
https://labelstud.io/
and for annotating text spans there are so many open source tools
https://github.com/doccano/doccano
I worked for a company a few years back that built annotation tools for projects we sold to customers but never quite got to a polished general purpose annotator. Today there are an overwhelming number of companies in this space and products I never heard of, many of which are cloud based or paid. Looks like a gold rush to me.
Project mention: [Research] Detecting Annotation Errors in Semantic Segmentation Data | /r/MachineLearning | 2023-11-05We have feely open-sourced our new method for improving segmentation data, published a paper on the research behind it, and released a 5-min code tutorial. You can also read more in the blog if you'd like.
Project mention: Show HN: Superpipe – optimized LLM pipelines for structured outputs | news.ycombinator.com | 2024-03-26
Project mention: Python package for administering surveys to LLMs | news.ycombinator.com | 2024-04-18
Project mention: Show HN: Demo of using DVC and MLFlow for ML experiments | news.ycombinator.com | 2024-01-29
Python data-labeling related posts
- You Can't Have a Free Software AI Stack
- How we used AI to automate stock sentiment classification
- German's NLP startup Kern AI has raised €2.7M in seed funding to accelerate its recent growth
- Why and how we started Kern AI (our seed funding announcement)
- GPT and BERT: A Comparison of Transformer Architectures
- Open-source tool to label, assess and maintain natural language data. Treat training data like a software artifact!
- Introducing bricks, an open-source content-library for NLP
-
A note from our sponsor - SaaSHub
www.saashub.com | 23 Apr 2024
Index
What are some of the best open-source data-labeling projects in Python? This list will help you:
Project | Stars | |
---|---|---|
1 | doccano | 8,966 |
2 | cleanlab | 8,592 |
3 | refinery | 1,360 |
4 | compose | 472 |
5 | bbox-visualizer | 374 |
6 | hover | 313 |
7 | mutate | 149 |
8 | superpipe | 94 |
9 | edsl | 23 |
10 | modzy-labelstudio-sample | 17 |
11 | bunny-party | 10 |
Sponsored