InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now. Learn more →
Grab-site Alternatives
Similar projects and alternatives to grab-site
-
docker-swag
Nginx webserver and reverse proxy with php support and a built-in Certbot (Let's Encrypt) client. It also contains fail2ban for intrusion prevention.
-
InfluxDB
InfluxDB – Built for High-Performance Time Series Workloads. InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.
-
ArchiveBox
🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more...
-
hunter-dkim
Discusses how to verify DKIM signatures in old emails, namely one of the Hunter Biden emails in the news
-
-
linkwarden
⚡️⚡️⚡️ Self-hosted collaborative bookmark manager to collect, organize, and preserve webpages, articles, and documents.
-
-
browsertrix-crawler
Run a high-fidelity browser-based web archiving crawler in a single Docker container
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
-
-
urlwatch
Watch (parts of) webpages and get notified when something changes via e-mail, on your phone or via other means. Highly configurable.
-
good-karma-kit
😇 A Docker Compose bundle to run on servers with spare CPU, RAM, disk, and bandwidth to help the world. Includes Tor, ArchiveWarrior, BOINC, and more...
-
vectordb
A minimal Python package for storing and retrieving text using chunking, embeddings, and vector search. (by kagisearch)
-
-
-
-
-
forum-dl
Scrape posts, threads from forums, news aggregators, mail archives, export to JSONL, mailbox, WARC
-
-
wget2
The successor of GNU Wget. Contributions preferred at https://gitlab.com/gnuwget/wget2. But accepted here as well 😍
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
grab-site discussion
grab-site reviews and mentions
-
ArchiveBox is evolving: the future of self-hosted internet archives
https://github.com/ArchiveTeam/grab-site might be helpful. I'm a fan of the ability to create WARC archives, put them in object storage (whether that is IA, S3, Backblaze B2, etc), and then keep them in cold storage or serve them up via HTTPS or a torrent (mutable, preferred).
-
HTTrack Website Copier
I'm aware of this tool, but I'm sure there are caveats in terms of "totally" cloning a website:
https://github.com/ArchiveTeam/grab-site
- We're losing our digital history. Can the Internet Archive save it?
- How to download a copy of a website using Wget
-
Ask HN: How can I back up an old vBulletin forum without admin access?
The format you want is WARC. Even the Library of Congress uses it. There are many many WARC scrapers. I'd look at what the Internet Archive recommends. A quick search turned up this from the Archive Team and Jason Scott https://github.com/ArchiveTeam/grab-site (https://wiki.archiveteam.org/index.php/Who_We_Are) but I found that in less than 15 seconds of searching so do your own diligence.
-
struggling to download websites
You can use grab-site with --no-offsite-links and --igsets=mediawiki.
- Internet Archive Down, will be up and running soon (i hope).
-
best tool for downloading forum posts in real-time?
Does the forum provide real-time notification for new posts? Like maybe a RSS feed, or a 'New' section? If so, some scripting around grab-site or httrack could grab them quickly.
-
How are you archiving websites you visit?
After a lot of searching for a similar topic, this is a tool I found which works pretty well: https://github.com/ArchiveTeam/grab-site
-
Help building or mirroring docs.microsoft.com
Crawling is of course the other option. I've seen https://github.com/ArchiveTeam/grab-site in the wiki, but I'm unsure how to host the resulting .warc archives.
-
A note from our sponsor - InfluxDB
www.influxdata.com | 24 May 2025
Stats
ArchiveTeam/grab-site is an open source project licensed under GNU General Public License v3.0 or later which is an OSI approved license.
The primary programming language of grab-site is Python.