Python data-labeling

Open-source Python projects categorized as data-labeling

Top 11 Python data-labeling Projects

  • doccano

    Open source annotation tool for machine learning practitioners.

  • Project mention: You Can't Have a Free Software AI Stack | news.ycombinator.com | 2023-07-13

    Huh?

    I wrote my own system for classifying a stream of texts in Python, I might Open Source it one of these days but I have to get it to the point where it is modular enough that I can customize it to do the particular things I want without subjecting people to my whims... I use it every day and I'm not afraid to demo it because it is rock solid.

    My understanding is that my system would not be hard to adapt to work on images for certain kinds of tasks.

    Pytorch is open source, Huggingface is open source. CUDA isn't. This is

    https://labelstud.io/

    and for annotating text spans there are so many open source tools

    https://github.com/doccano/doccano

    I worked for a company a few years back that built annotation tools for projects we sold to customers but never quite got to a polished general purpose annotator. Today there are an overwhelming number of companies in this space and products I never heard of, many of which are cloud based or paid. Looks like a gold rush to me.

  • cleanlab

    The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.

  • Project mention: [Research] Detecting Annotation Errors in Semantic Segmentation Data | /r/MachineLearning | 2023-11-05

    We have feely open-sourced our new method for improving segmentation data, published a paper on the research behind it, and released a 5-min code tutorial. You can also read more in the blog if you'd like.

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • refinery

    The data scientist's open-source choice to scale, assess and maintain natural language data. Treat training data like a software artifact.

  • compose

    A machine learning tool for automated prediction engineering. It allows you to easily structure prediction problems and generate labels for supervised learning. (by alteryx)

  • bbox-visualizer

    Make drawing and labeling bounding boxes easy as cake

  • hover

    :speedboat: Label data at scale. Fun and precision included. (by phurwicz)

  • mutate

    A library to synthesize text datasets using Large Language Models (LLM)

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • superpipe

    Superpipe - optimized LLM pipelines for structured data

  • Project mention: Show HN: Superpipe – optimized LLM pipelines for structured outputs | news.ycombinator.com | 2024-03-26
  • edsl

    Design, conduct and analyze results of AI-powered surveys and experiments. Simulate social science and market research with large numbers of AI agents and LLMs. (by expectedparrot)

  • Project mention: Python package for administering surveys to LLMs | news.ycombinator.com | 2024-04-18
  • modzy-labelstudio-sample

    Create training data labels from a production model with Modzy, Dropbox, and Label Studio

  • bunny-party

    A demonstration of how DVC and MLFlow can be used in the task of data relabeling

  • Project mention: Show HN: Demo of using DVC and MLFlow for ML experiments | news.ycombinator.com | 2024-01-29
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python data-labeling related posts

Index

What are some of the best open-source data-labeling projects in Python? This list will help you:

Project Stars
1 doccano 8,966
2 cleanlab 8,592
3 refinery 1,360
4 compose 472
5 bbox-visualizer 374
6 hover 313
7 mutate 149
8 superpipe 94
9 edsl 23
10 modzy-labelstudio-sample 17
11 bunny-party 10

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com