handy_sql_queries
datacompy
handy_sql_queries | datacompy | |
---|---|---|
2 | 4 | |
1 | 396 | |
- | 11.1% | |
4.5 | 7.5 | |
10 months ago | 10 days ago | |
Python | ||
The Unlicense | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
handy_sql_queries
-
Data diffs: Algorithms for explaining what changed in a dataset (2022)
If you are looking for an easy way to compare two tables in SQL, whether every single row and every single column are the same, you can use the following technique:
https://github.com/gregw2hn/handy_sql_queries/blob/main/sql_...
-
How to Check 2 SQL Tables Are the Same
This is part of why I don't use MINUS for table value comparisons... All you need is just GROUP BY/UNION ALL/HAVING, using the following technique:
https://github.com/gregw2hn/handy_sql_queries/blob/main/sql_...
datacompy
- How to Check 2 SQL Tables Are the Same
-
Comparing 2 CSV files
datacompy is a package to compare 2 pandas dataframes
- Performing Data Tests on External Data/Complex Data Quality Checks
-
Best Practice When Comparing Data Across Two SQL Servers in Python
https://github.com/capitalone/datacompy will allow you to compare two tables/dataframes against one another, and see detailed results on the difference.
What are some alternatives?
deep-diff2 - Deep diff Clojure data structures and pretty print the result
koalas - Koalas: pandas API on Apache Spark
spark-extension - A library that provides useful extensions to Apache Spark and PySpark.
data-science-ipython-notebooks - Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
recidiffist - Diffs for structured data
data-diff - Compare tables within or across databases
macrobase-diff - Minimal implementation of Macrobase Diff
dbt-audit-helper - Useful macros when performing data audits
visualiza - A general-purpose dynamic data visualizer.
diffable-sql
popmon - Monitor the stability of a Pandas or Spark dataframe ⚙︎