-
ArchiveBox
🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more...
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
webpages-to-ebook
Create an EPUB from a list of URLs. Standing on the shoulders of Wget, Readability and Pandoc.
Yeah, in theory it shouldn't be too difficult, especially if you know some python already. I think you'd want to look into scrapy as a starting point. Here's a decent tutorial
ArchiveBox, which is essentially a self-hosted version of archive.org that you can feed URLs, with some support for crawling websites, I think. Also, apparently it can make PDFs, which I didn't know.
This could help: https://github.com/georgjaehnig/webpages-to-ebook. You just need the URLs of all (wiki) pages to be included.