Python data-curation

Open-source Python projects categorized as data-curation

Top 4 Python data-curation Projects

data-curation
  1. cleanlab

    The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.

    Project mention: Ask HN: Not a webdev, why are these sites so good? | news.ycombinator.com | 2024-06-18

    https://cleanlab.ai/

  2. CodeRabbit

    CodeRabbit: AI Code Reviews for Developers. Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR.

    CodeRabbit logo
  3. fiftyone

    Refine high-quality datasets and visual AI models

    Project mention: Launch HN: Enhanced Radar (YC W25) – A safety net for air traffic control | news.ycombinator.com | 2025-03-04

    Are there already bird not a bird datasets?

    Procedures for creating "bird on Multispectral plane radar and video" dataset(s):

    Tag birds on the dashcam video with timecoded sensor data and a segmentation and annotation tool.

    Pinch to zoom, auto-edge detect, classification probability, sensor status

    voxel51/fiftyone does segmentation and annotation with video and possibly Multispectral data: https://github.com/voxel51/fiftyone

  4. fastdup

    fastdup is a powerful, free tool designed to rapidly generate valuable insights from image and video datasets. It helps enhance the quality of both images and labels, while significantly reducing data operation costs, all with unmatched scalability.

    Project mention: AIM Weekly 17 June 2024 | dev.to | 2024-06-17
  5. sliceguard

    A library for detecting problematic data segments in structured and unstructured data with few lines of code.

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python data-curation discussion

Log in or Post with

Python data-curation related posts

Index

What are some of the best open-source data-curation projects in Python? This list will help you:

# Project Stars
1 cleanlab 10,241
2 fiftyone 9,298
3 fastdup 1,665
4 sliceguard 63

Sponsored
CodeRabbit: AI Code Reviews for Developers
Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR.
coderabbit.ai

Did you know that Python is
the 2nd most popular programming language
based on number of references?