Options to backup https://trythatsoap.com/?

Our great sponsors

InfluxDB - Power Real-Time Data Analytics at Scale

WorkOS - The modern identity platform for B2B SaaS

SaaSHub - Software Alternatives and Reviews

Our great sponsors

ArchiveBox

248 19,737 9.7 Python

🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more...

You could check out some of the options Archivebox has to offer if you like running servers.

fetchurls

4 123 0.0 Shell

A bash script to spider a site, follow links, and fetch urls (with built-in filtering) into a generated text file.

Welcome. Since Archivebox doesn't crawl pages, you might be interested in something like fetchurls as well.

InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
Yacy

115 3,244 8.7 Java

Distributed Peer-to-Peer Web Search Engine and Intranet Search Appliance

YaCy is also a decent webcrawler. Probably overkill in most cases since it also wants to create a search index, but it can also export a txt list of URLs that could be fed into Archivebox. (need to opt out of settings to not store random site data for others)

browsertrix-crawler

13 538 9.0 TypeScript

Run a high-fidelity browser-based crawler in a single Docker container

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project