SaaSHub helps you find the best software and product alternatives Learn more →
Top 4 Python data-curation Projects
-
cleanlab
The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.
Project mention: Ask HN: Not a webdev, why are these sites so good? | news.ycombinator.com | 2024-06-18https://cleanlab.ai/
-
CodeRabbit
CodeRabbit: AI Code Reviews for Developers. Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR.
-
Project mention: Launch HN: Enhanced Radar (YC W25) – A safety net for air traffic control | news.ycombinator.com | 2025-03-04
Are there already bird not a bird datasets?
Procedures for creating "bird on Multispectral plane radar and video" dataset(s):
Tag birds on the dashcam video with timecoded sensor data and a segmentation and annotation tool.
Pinch to zoom, auto-edge detect, classification probability, sensor status
voxel51/fiftyone does segmentation and annotation with video and possibly Multispectral data: https://github.com/voxel51/fiftyone
-
fastdup
fastdup is a powerful, free tool designed to rapidly generate valuable insights from image and video datasets. It helps enhance the quality of both images and labels, while significantly reducing data operation costs, all with unmatched scalability.
-
sliceguard
A library for detecting problematic data segments in structured and unstructured data with few lines of code.
Python data-curation discussion
Python data-curation related posts
-
Visualize your dataset using DINOv2 embedding
-
Visualize your dataset using DINOv2 embedding
-
[R][P] How to extract feature vectors of large datasets using DINOv2 on CPU
-
Find image duplicates and outliers – A free, scalable, efficient tool
-
Find image duplicates and outliers – A free, scalable, efficient tool
-
How can we match images in our database?
-
[R] We found nearly half a billion duplicated images on LAION-2B-en.
-
A note from our sponsor - SaaSHub
www.saashub.com | 27 Mar 2025
Index
What are some of the best open-source data-curation projects in Python? This list will help you:
# | Project | Stars |
---|---|---|
1 | cleanlab | 10,241 |
2 | fiftyone | 9,298 |
3 | fastdup | 1,665 |
4 | sliceguard | 63 |