Python data-labeling

Open-source Python projects categorized as data-labeling

Top 9 Python data-labeling Projects

  • doccano

    Open source annotation tool for machine learning practitioners.

    Project mention: You Can't Have a Free Software AI Stack | news.ycombinator.com | 2023-07-13

    Huh?

    I wrote my own system for classifying a stream of texts in Python, I might Open Source it one of these days but I have to get it to the point where it is modular enough that I can customize it to do the particular things I want without subjecting people to my whims... I use it every day and I'm not afraid to demo it because it is rock solid.

    My understanding is that my system would not be hard to adapt to work on images for certain kinds of tasks.

    Pytorch is open source, Huggingface is open source. CUDA isn't. This is

    https://labelstud.io/

    and for annotating text spans there are so many open source tools

    https://github.com/doccano/doccano

    I worked for a company a few years back that built annotation tools for projects we sold to customers but never quite got to a polished general purpose annotator. Today there are an overwhelming number of companies in this space and products I never heard of, many of which are cloud based or paid. Looks like a gold rush to me.

  • cleanlab

    The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.

    Project mention: [Research] Detecting Annotation Errors in Semantic Segmentation Data | /r/MachineLearning | 2023-11-05

    We have feely open-sourced our new method for improving segmentation data, published a paper on the research behind it, and released a 5-min code tutorial. You can also read more in the blog if you'd like.

  • WorkOS

    The modern API for authentication & user identity. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

  • refinery

    The data scientist's open-source choice to scale, assess and maintain natural language data. Treat training data like a software artifact.

    Project mention: [P] We are building a curated list of open source tooling for data-centric AI workflows, looking for contributions. | /r/MachineLearning | 2023-03-03

    You definitely forgot https://www.kern.ai/ :)

  • compose

    A machine learning tool for automated prediction engineering. It allows you to easily structure prediction problems and generate labels for supervised learning. (by alteryx)

  • bbox-visualizer

    Make drawing and labeling bounding boxes easy as cake

  • hover

    :speedboat: Label data at scale. Fun and precision included. (by phurwicz)

  • mutate

    A library to synthesize text datasets using Large Language Models (LLM)

  • Onboard AI

    ChatGPT with full context of any GitHub repo. Onboard AI learns any GitHub repo in minutes and lets you chat with it to locate functionality, understand different parts, and generate new code. Use it for free at app.getonboardai.com.

  • modzy-labelstudio-sample

    Create training data labels from a production model with Modzy, Dropbox, and Label Studio

  • bunny-party

    A demonstration of how DVC and MLFlow can be used in the task of data relabeling

    Project mention: Show HN: Demo of using DVC and MLFlow for ML experiments | news.ycombinator.com | 2024-01-29
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2024-01-29.

Python data-labeling related posts

Index

What are some of the best open-source data-labeling projects in Python? This list will help you:

Project Stars
1 doccano 8,744
2 cleanlab 7,947
3 refinery 1,321
4 compose 464
5 bbox-visualizer 369
6 hover 312
7 mutate 148
8 modzy-labelstudio-sample 16
9 bunny-party 10
Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com