python-libzim
Libzim binding for Python: read/write ZIM files in Python (by openzim)
PlainTextWikipedia
Convert Wikipedia database dumps into plaintext files (by daveshap)
python-libzim | PlainTextWikipedia | |
---|---|---|
2 | 6 | |
54 | 261 | |
- | - | |
7.2 | 1.2 | |
28 days ago | almost 3 years ago | |
Python | Python | |
GNU General Public License v3.0 only | MIT License |
The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
python-libzim
Posts with mentions or reviews of python-libzim.
We have used some of these posts to build our list of alternatives
and similar projects. The last one was on 2022-11-11.
-
ZIM Reader for Anki
Looking forward to see Windows support in python-libzim by the way; I'm currently relying on zimply-core for Windows.
-
Updated: I've saved all of Wikipedia into a SQLITE database!
https://github.com/openzim/python-libzim is the official one
PlainTextWikipedia
Posts with mentions or reviews of PlainTextWikipedia.
We have used some of these posts to build our list of alternatives
and similar projects. The last one was on 2023-04-11.
-
How to download all wikipedia articles in plaintext ( no links, images, talk, revision , SQL, XML etc. ).
You'd have to convert the dump yourself. I found this project, but it was last updated two years ago, so who knows if it still works. They uploaded a dump from 2020 if that is still useful for you. (note, while plaintext, the output is still encapsulated in JSON) Here's another project that converted the dump to plaintext, but the last one was from 2014. You can probably find more by Googling "Wikipedia plaintext dump".
-
What the fuck
Funny enough, "Simplified English Wikipedia" dump file is about 1GB, as stationed here: https://github.com/daveshap/PlainTextWikipedia
-
Update: Indexing Wikipedia offline with SOLR
Great news everyone! You can now index Wikipedia offline with a power indexing engine! This is not meant to be a replacement for KIWIX or anything that, this is more for programmatic use. Say, for instance, you wanted to write your own search engine. Here's the repo: https://github.com/daveshap/PlainTextWikipedia
-
Help with indexing offline wikipedia with SOLR
Here's my base project: https://github.com/daveshap/PlainTextWikipedia
- PlainTextWikipedia: Convert Wikipedia database dumps into plain text JSON files
- Updated: I've saved all of Wikipedia into a SQLITE database!
What are some alternatives?
When comparing python-libzim and PlainTextWikipedia you can also consider the following projects:
zimply-core - An easy to use offline reader for ZIM files right in your browser!
wikitextparser - A Python library to parse MediaWiki WikiText
ndjson.github.io - Info Website for NDJSON