Our great sponsors
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
It looks like Kiwix uses the ZIM file format, which appears to have diffing support [0] (see zimdiff and zimpatch). That said, it doesn't look like Kiwix actually publishes those diffs.
[0] https://github.com/openzim/zim-tools/tree/master/src
Not related to the OP topic or zim but I was looking into archiving my bookmarks and other content like documentation sites and wikis. I'll list some of the things I ended up using.
ArchiveBox[1]: Pretty much a self-hosted wayback machine. It can save websites as plain html, screenshot, text, and some other formats. I have my bookmarks archived in it and have a bookmarklet to easily add new websites to it. If you use the docker-compose you can enable a full-text search backend for an easy search setup.
WebRecorder[2]: A browser extension that creates WACZ archives directly in the browser capturing exactly what content you load. I use it on sites with annoying dynamic content that sites like wayback and ArchiveBox wouldn't be able to copy.
ReplayWeb[3]: An interface to browse archive types like WARC, WACZ, and HAR. The interface is just like browsing through your browser. It can be self-hosted as well for the full offline experience.
browsertrix-crawler[4]: A CLI tool to scrape websites and output to WACZ. Its super easy to run with Docker and I use it to scrape entire blogs and docs for offline use. It uses Chrome to load webpages and has some extra features like custom browser profiles, interactive login, and autoscroll/autoplay. I use the `--generateWACZ` parameter so I can use ReplayWeb to easily browse through the final output.
For bookmark and misc webpage archiving then ArchiveBox should be more than enough. Check out this repo for an amazing list of tools and resources https://github.com/iipc/awesome-web-archiving
[1] https://github.com/ArchiveBox/ArchiveBox
I think there are better ways to open ZIM files. I've had massive trouble with Kiwix. The old version seems broke beyond repair and the new version is too heavy.
ZIMply on branch `version2` has worked pretty well for me [1]. The search works a lot better and it's really nicely formatted.
[1] https://github.com/kimbauters/ZIMply/tree/version2