How to Download All of Wikipedia onto a USB Flash Drive

Our great sponsors

WorkOS - The modern identity platform for B2B SaaS

InfluxDB - Power Real-Time Data Analytics at Scale

SaaSHub - Software Alternatives and Reviews

Our great sponsors

zim-tools

4 110 8.0 C++

Various ZIM command line tools

It looks like Kiwix uses the ZIM file format, which appears to have diffing support [0] (see zimdiff and zimpatch). That said, it doesn't look like Kiwix actually publishes those diffs.
[0] https://github.com/openzim/zim-tools/tree/master/src

wiktextract

7 702 9.8 Python

Wiktionary dump file parser and multilingual data extractor
WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
CDPedia

2 33 10.0 Python

CDPedia is a project to make the Wikipedia accesable offline
awesome-web-archiving

13 1,811 5.2

An Awesome List for getting started with web archiving

Not related to the OP topic or zim but I was looking into archiving my bookmarks and other content like documentation sites and wikis. I'll list some of the things I ended up using.
ArchiveBox[1]: Pretty much a self-hosted wayback machine. It can save websites as plain html, screenshot, text, and some other formats. I have my bookmarks archived in it and have a bookmarklet to easily add new websites to it. If you use the docker-compose you can enable a full-text search backend for an easy search setup.
WebRecorder[2]: A browser extension that creates WACZ archives directly in the browser capturing exactly what content you load. I use it on sites with annoying dynamic content that sites like wayback and ArchiveBox wouldn't be able to copy.
ReplayWeb[3]: An interface to browse archive types like WARC, WACZ, and HAR. The interface is just like browsing through your browser. It can be self-hosted as well for the full offline experience.
browsertrix-crawler[4]: A CLI tool to scrape websites and output to WACZ. Its super easy to run with Docker and I use it to scrape entire blogs and docs for offline use. It uses Chrome to load webpages and has some extra features like custom browser profiles, interactive login, and autoscroll/autoplay. I use the `--generateWACZ` parameter so I can use ReplayWeb to easily browse through the final output.
For bookmark and misc webpage archiving then ArchiveBox should be more than enough. Check out this repo for an amazing list of tools and resources https://github.com/iipc/awesome-web-archiving
[1] https://github.com/ArchiveBox/ArchiveBox

replayweb.page

24 620 8.7 TypeScript

Serverless replay of web archives directly in the browser
browsertrix-crawler

13 540 9.1 TypeScript

Run a high-fidelity browser-based crawler in a single Docker container
ZIMply

1 75 0.0 Python

An easy to use offline reader for ZIM files right in your browser!

I think there are better ways to open ZIM files. I've had massive trouble with Kiwix. The old version seems broke beyond repair and the new version is too heavy.
ZIMply on branch `version2` has worked pretty well for me [1]. The search works a lot better and it's really nicely formatted.
[1] https://github.com/kimbauters/ZIMply/tree/version2

InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project