sweetviz
pandera
Our great sponsors
sweetviz | pandera | |
---|---|---|
1 | 7 | |
2,837 | 3,007 | |
- | 5.2% | |
6.7 | 9.1 | |
5 months ago | 2 days ago | |
Python | Python | |
MIT License | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
sweetviz
-
Automated Data Profiling and Attribute Clustering using unsupervised ML techniques
Take a look at this package which computes associations between variables and other viz and can infer some types https://github.com/fbdesignpro/sweetviz
pandera
-
Unit testing functions that input/output dataframes?
I use Pandera, so I just need to define the expected input/output schemas (i.e. column names, types, and constraints on them), and Pandera automatically generates fake data for the unit tests, and validates the result: https://github.com/unionai-oss/pandera
-
Great Expectations is annoyingly cumbersome
Please DM me! Or we can discuss in this issue which I just created: https://github.com/unionai-oss/pandera/issues/1042
-
Data validation for dashboards
In my opinion for simple data validation tasks the best solution is always Pandera.
-
Show HN: Pandera 0.8.0 – validate pandas, dask, modin, and koalas dataframes
* adds support for mypy static type-linting if you need that extra type safety
Repo: https://github.com/pandera-dev/pandera
-
Pandera 0.8.0: Schema Validation for Pandas, Dask, Modin, and Koalas DataFrames. Oh, and also out-of-the-box Pydantic and Mypy support :)
Repo: https://github.com/pandera-dev/pandera
-
How heavily do you use Great Expectations?
pandera
What are some alternatives?
dataprep - Open-source low code data preparation library in python. Collect, clean and visualization your data in python with a few lines of code.
soda-sql - Data profiling, testing, and monitoring for SQL accessible data.
ydata-profiling - 1 Line of code data quality profiling & exploratory data analysis for Pandas and Spark DataFrames.
Schematics - Python Data Structures for Humans™.
Optimus - :truck: Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
jsonschema - An implementation of the JSON Schema specification for Python
popmon - Monitor the stability of a Pandas or Spark dataframe ⚙︎
pointblank - Data quality assessment and metadata reporting for data frames and database tables
dtale-desktop - Build a data visualization dashboard with simple snippets of python code
swifter - A package which efficiently applies any function to a pandas dataframe or series in the fastest available manner
mlgauge - A simple library to benchmark the performance of machine learning methods across different datasets.
dbt-expectations - Port(ish) of Great Expectations to dbt test macros