Our great sponsors
-
modern-office-git-diff
An experiment in tracking and diffing versions of modern Microsoft Office files in Git.
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
tsv-utils
eBay's TSV Utilities: Command line tools for large, tabular data files. Filtering, statistics, sampling, joins and more.
This approach works by storing the actual SQLite binary files in Git and then using a custom "diff" configuration to dump each file as SQL and compare the result.
It's a neat trick, but storing binary files like that in Git isn't as space efficient as using a plain text format.
I built my own tooling to solve this problem: https://datasette.io/tools/sqlite-diffable - which outputs a “diffable” copy of the data in a SQLite database, precisely so you can store it in Git and look at the differences later.
I’ve been running that for a couple of years in this repo: https://github.com/simonw/simonwillisonblog-backup - which provides a backup of my blog’s PostgreSQL Django database (first converted to SQLite and then dumped out using sqlite-diffable).
Here’s an example diff: https://github.com/simonw/simonwillisonblog-backup/commit/72...
I was also exploring something like this a few years back but for Office files. This exact approach seemed like an absolute win to me, but I ended up not using it, because this won't work in the GitHub web UI. This won't be a deal-breaker to many, but people should be aware of it still. In the end I ended up doing this: https://github.com/TomasHubelbauer/modern-office-git-diff/
We do the exact same thing to keep track of some credentials we use sops[1] and AWS KMS to separate credentials by sensitivity, then use the git differ to view the diffs between the encrypted secrets
Definitely not best practice security-wise, but it works well
[1] https://github.com/getsops/sops
You can also convert the data to JSON using sqlite_to_json
https://github.com/emadda/transform-x#clis
You might want to look at tsv-utils, or a similar project: https://github.com/eBay/tsv-utils
For the SQL part, but maybe a lot heavier, you can use one of the projects listed on this page: https://github.com/multiprocessio/dsq (No longer maintained, but has links to lots of other projects)
You might want to look at tsv-utils, or a similar project: https://github.com/eBay/tsv-utils
For the SQL part, but maybe a lot heavier, you can use one of the projects listed on this page: https://github.com/multiprocessio/dsq (No longer maintained, but has links to lots of other projects)