Options to backup https://trythatsoap.com/?

This page summarizes the projects mentioned and recommended in the original post on /r/DataHoarder

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
  • ArchiveBox

    🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more...

  • You could check out some of the options Archivebox has to offer if you like running servers.

  • fetchurls

    A bash script to spider a site, follow links, and fetch urls (with built-in filtering) into a generated text file.

  • Welcome. Since Archivebox doesn't crawl pages, you might be interested in something like fetchurls as well.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • Yacy

    Distributed Peer-to-Peer Web Search Engine and Intranet Search Appliance

  • YaCy is also a decent webcrawler. Probably overkill in most cases since it also wants to create a search index, but it can also export a txt list of URLs that could be fed into Archivebox. (need to opt out of settings to not store random site data for others)

  • browsertrix-crawler

    Run a high-fidelity browser-based crawler in a single Docker container

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts