handy_sql_queries
deep-diff2
handy_sql_queries | deep-diff2 | |
---|---|---|
2 | 1 | |
1 | 290 | |
- | 0.3% | |
4.5 | 6.0 | |
10 months ago | 3 months ago | |
Clojure | ||
The Unlicense | Eclipse Public License 1.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
handy_sql_queries
-
Data diffs: Algorithms for explaining what changed in a dataset (2022)
If you are looking for an easy way to compare two tables in SQL, whether every single row and every single column are the same, you can use the following technique:
https://github.com/gregw2hn/handy_sql_queries/blob/main/sql_...
-
How to Check 2 SQL Tables Are the Same
This is part of why I don't use MINUS for table value comparisons... All you need is just GROUP BY/UNION ALL/HAVING, using the following technique:
https://github.com/gregw2hn/handy_sql_queries/blob/main/sql_...
deep-diff2
-
Data diffs: Algorithms for explaining what changed in a dataset (2022)
At Latacora, we use a giant pile of Clojure (almost everything, with specific measured exceptions, is). As a side effect, we have a lot of data. Not necessarily a lot in the sense of "big S3 bill", but definitely a lot in the sense of "you might not expect this being in a machine-readable format".
Things like: what Lambdas existed in a customer AWS account 6 months ago in us-east-2 that had access to a specific SQS queue (because we learned later that one of the consumers of that queue would actually consume Python pickles if you asked nicely, and hence get you RCE).
As a side effect, we do a lot of data diffing: just mostly on more vanilla Clojure structures rather than data sets in the Datasette/CSV/... sense.
For example: https://github.com/latacora/recidiffist (which we also have wired up to Terraform + S3, so if you write some files to S3, you can get the structured diffs right next to it for free). It's one of those things that's incredibly simple and works ridiculously well. Well, if you do it consistently anyway.
Also https://github.com/lambdaisland/deep-diff2 for when we're more interested in presenting it to humans.
What are some alternatives?
spark-extension - A library that provides useful extensions to Apache Spark and PySpark.
recidiffist - Diffs for structured data
macrobase-diff - Minimal implementation of Macrobase Diff
datacompy - Pandas and Spark DataFrame comparison for humans and more!
ExplainDaV
lakeFS - lakeFS - Data version control for your data lake | Git for data
dbt-audit-helper - Useful macros when performing data audits
diffable-sql
Apache Calcite - Apache Calcite