Python data-labeling

Open-source Python projects categorized as data-labeling

Top 11 Python data-labeling Projects

data-labeling
  1. cleanlab

    The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.

    Project mention: Ask HN: Not a webdev, why are these sites so good? | news.ycombinator.com | 2024-06-18

    https://cleanlab.ai/

  2. InfluxDB

    InfluxDB – Built for High-Performance Time Series Workloads. InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.

    InfluxDB logo
  3. doccano

    Open source annotation tool for machine learning practitioners.

  4. refinery

    The data scientist's open-source choice to scale, assess and maintain natural language data. Treat training data like a software artifact.

    Project mention: Ultimate guide to prompt engineering | dev.to | 2024-12-07

    Tools: Platforms like LangChain, Kern AI Refinery, and Langtail simplify testing, debugging, and optimizing prompts.

  5. compose

    A machine learning tool for automated prediction engineering. It allows you to easily structure prediction problems and generate labels for supervised learning. (by alteryx)

  6. bbox-visualizer

    Make drawing and labeling bounding boxes easy as cake

  7. hover

    :speedboat: Label data at scale. Fun and precision included. (by phurwicz)

  8. edsl

    Design, conduct and analyze results of AI-powered surveys and experiments. Simulate social science and market research with large numbers of AI agents and LLMs.

    Project mention: Python Library for Structured Data Extraction via LLM | news.ycombinator.com | 2024-08-14

    Hey thanks for noticing - here's the MIT licensed library it's based on: https://github.com/expectedparrot/edsl

  9. SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  10. mutate

    A library to synthesize text datasets using Large Language Models (LLM)

  11. superpipe

    Superpipe - optimized LLM pipelines for structured data

  12. modzy-labelstudio-sample

    Create training data labels from a production model with Modzy, Dropbox, and Label Studio

  13. bunny-party

    A demonstration of how DVC and MLFlow can be used in the task of data relabeling

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python data-labeling discussion

Log in or Post with

Python data-labeling related posts

  • Ultimate guide to prompt engineering

    1 project | dev.to | 7 Dec 2024
  • Python Library for Structured Data Extraction via LLM

    1 project | news.ycombinator.com | 14 Aug 2024
  • You Can't Have a Free Software AI Stack

    2 projects | news.ycombinator.com | 13 Jul 2023
  • How we used AI to automate stock sentiment classification

    1 project | dev.to | 21 Feb 2023
  • German's NLP startup Kern AI has raised €2.7M in seed funding to accelerate its recent growth

    1 project | /r/u_technicalbeep | 16 Feb 2023
  • Why and how we started Kern AI (our seed funding announcement)

    1 project | dev.to | 16 Feb 2023
  • GPT and BERT: A Comparison of Transformer Architectures

    2 projects | dev.to | 9 Feb 2023
  • A note from our sponsor - SaaSHub
    www.saashub.com | 21 May 2025
    SaaSHub helps you find the best software and product alternatives Learn more →

Index

What are some of the best open-source data-labeling projects in Python? This list will help you:

# Project Stars
1 cleanlab 10,526
2 doccano 9,986
3 refinery 1,438
4 compose 505
5 bbox-visualizer 396
6 hover 326
7 edsl 240
8 mutate 151
9 superpipe 110
10 modzy-labelstudio-sample 18
11 bunny-party 10

Sponsored
InfluxDB – Built for High-Performance Time Series Workloads
InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.
www.influxdata.com

Did you know that Python is
the 2nd most popular programming language
based on number of references?