Are there any efficient methods available to recursively download (nearly) all pages of a game's wiki to a single PDF file?

This page summarizes the projects mentioned and recommended in the original post on /r/DataHoarder

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • Scrapy

    Scrapy, a fast high-level web crawling & scraping framework for Python.

  • Yeah, in theory it shouldn't be too difficult, especially if you know some python already. I think you'd want to look into scrapy as a starting point. Here's a decent tutorial

  • ArchiveBox

    🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more...

  • ArchiveBox, which is essentially a self-hosted version of archive.org that you can feed URLs, with some support for crawling websites, I think. Also, apparently it can make PDFs, which I didn't know.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • webpages-to-ebook

    Create an EPUB from a list of URLs. Standing on the shoulders of Wget, Readability and Pandoc.

  • This could help: https://github.com/georgjaehnig/webpages-to-ebook. You just need the URLs of all (wiki) pages to be included.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • How is ArchiveBox?

    4 projects | /r/selfhosted | 27 Dec 2021
  • Vice website is shutting down

    1 project | news.ycombinator.com | 23 Feb 2024
  • How to scrape a website with Python (Beginner tutorial)

    1 project | dev.to | 22 Feb 2024
  • Scrapy: A Fast and Powerful Scraping and Web Crawling Framework

    1 project | news.ycombinator.com | 16 Feb 2024
  • Seven Python Projects to Elevate Your Coding Skills

    3 projects | dev.to | 15 Feb 2024