Top 12 Python data-profiling Projects
-
ydata-profiling
1 Line of code data quality profiling & exploratory data analysis for Pandas and Spark DataFrames.
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
cleanlab
The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.
-
soda-core
:zap: Data quality testing for the modern data stack (SQL, Spark, and Pandas) https://www.soda.io
-
Optimus
:truck: Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark (by ironmussa)
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
swiple
Swiple enables you to easily observe, understand, validate and improve the quality of your data
-
data-profiling
a set of scripts to pull meta data and data profiling metrics from relational database systems
-
metacrafter
Metadata and data identification tool and Python library. Identifies PII, common identifiers, language specific identifiers. Fully customizable and flexible rules
Project mention: Data Quality at Scale with Great Expectations, Spark, and Airflow on EMR | dev.to | 2023-04-24Great Expectations (GE) is an open-source data validation tool that helps ensure data quality.
Project mention: [Research] Detecting Annotation Errors in Semantic Segmentation Data | /r/MachineLearning | 2023-11-05We have feely open-sourced our new method for improving segmentation data, published a paper on the research behind it, and released a 5-min code tutorial. You can also read more in the blog if you'd like.
Project mention: Show HN: PipeRider – open-source Data Impact Analysis for dbt changes | news.ycombinator.com | 2023-09-06
Project mention: Metacrafter – semantic data types detection Python lib | news.ycombinator.com | 2024-03-13
Python data-profiling related posts
Index
What are some of the best open-source data-profiling projects in Python? This list will help you:
Project | Stars | |
---|---|---|
1 | ydata-profiling | 12,022 |
2 | great_expectations | 9,440 |
3 | cleanlab | 8,592 |
4 | sweetviz | 2,833 |
5 | soda-core | 1,745 |
6 | Optimus | 1,441 |
7 | cleanvision | 919 |
8 | popmon | 485 |
9 | piperider | 466 |
10 | swiple | 77 |
11 | data-profiling | 67 |
12 | metacrafter | 38 |
Sponsored