Are there any efficient methods available to recursively download (nearly) all pages of a game's wiki to a single PDF file?

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

Scrapy

180 50,954 9.6 Python

Scrapy, a fast high-level web crawling & scraping framework for Python.

Yeah, in theory it shouldn't be too difficult, especially if you know some python already. I think you'd want to look into scrapy as a starting point. Here's a decent tutorial

ArchiveBox

248 19,790 9.8 Python

🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more...

ArchiveBox, which is essentially a self-hosted version of archive.org that you can feed URLs, with some support for crawling websites, I think. Also, apparently it can make PDFs, which I didn't know.

InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
webpages-to-ebook

4 187 0.0 JavaScript

Create an EPUB from a list of URLs. Standing on the shoulders of Wget, Readability and Pandoc.

This could help: https://github.com/georgjaehnig/webpages-to-ebook. You just need the URLs of all (wiki) pages to be included.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

How is ArchiveBox?

4 projects | /r/selfhosted | 27 Dec 2021
Vice website is shutting down

1 project | news.ycombinator.com | 23 Feb 2024
How to scrape a website with Python (Beginner tutorial)

1 project | dev.to | 22 Feb 2024
Scrapy: A Fast and Powerful Scraping and Web Crawling Framework

1 project | news.ycombinator.com | 16 Feb 2024
Seven Python Projects to Elevate Your Coding Skills

3 projects | dev.to | 15 Feb 2024

Are there any efficient methods available to recursively download (nearly) all pages of a game's wiki to a single PDF file?

This page summarizes the projects mentioned and recommended in the original post on /r/DataHoarder
Python wget Web Crawling Archiving and Digital Preservation (DP) epub-generation
Post date: 10 Feb 2022

Scrapy

ArchiveBox

InfluxDB

webpages-to-ebook

Related posts

How is ArchiveBox?

Vice website is shutting down

How to scrape a website with Python (Beginner tutorial)

Scrapy: A Fast and Powerful Scraping and Web Crawling Framework

Seven Python Projects to Elevate Your Coding Skills

Are there any efficient methods available to recursively download (nearly) all pages of a game's wiki to a single PDF file?

This page summarizes the projects mentioned and recommended in the original post on /r/DataHoarder Python wget Web Crawling Archiving and Digital Preservation (DP) epub-generation Post date: 10 Feb 2022

Scrapy

ArchiveBox

InfluxDB

webpages-to-ebook

Related posts

How is ArchiveBox?

Vice website is shutting down

How to scrape a website with Python (Beginner tutorial)

Scrapy: A Fast and Powerful Scraping and Web Crawling Framework

Seven Python Projects to Elevate Your Coding Skills

This page summarizes the projects mentioned and recommended in the original post on /r/DataHoarder
Python wget Web Crawling Archiving and Digital Preservation (DP) epub-generation
Post date: 10 Feb 2022