is there a way to take "snapshots" of every page of a website?

This page summarizes the projects mentioned and recommended in the original post on /r/DataHoarder

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
  • fetchurls

    A bash script to spider a site, follow links, and fetch urls (with built-in filtering) into a generated text file.

  • Yacy

    Distributed Peer-to-Peer Web Search Engine and Intranet Search Appliance

  • I happen to already run a YaCy node, and it's decent at crawling things sometimes. Then the list could be fed into ArchiveBox.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • ArchiveBox

    🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more...

  • I've been playing around with ArchiveBox. It offers several different options for storing things.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts