grab-site
docker-swag
Our great sponsors
grab-site | docker-swag | |
---|---|---|
30 | 295 | |
1,258 | 2,516 | |
3.3% | 2.2% | |
3.8 | 9.2 | |
28 days ago | 6 days ago | |
Python | Dockerfile | |
GNU General Public License v3.0 or later | GNU General Public License v3.0 only |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
grab-site
-
Ask HN: How can I back up an old vBulletin forum without admin access?
The format you want is WARC. Even the Library of Congress uses it. There are many many WARC scrapers. I'd look at what the Internet Archive recommends. A quick search turned up this from the Archive Team and Jason Scott https://github.com/ArchiveTeam/grab-site (https://wiki.archiveteam.org/index.php/Who_We_Are) but I found that in less than 15 seconds of searching so do your own diligence.
-
struggling to download websites
You can use grab-site with --no-offsite-links and --igsets=mediawiki.
- Internet Archive Down, will be up and running soon (i hope).
-
best tool for downloading forum posts in real-time?
Does the forum provide real-time notification for new posts? Like maybe a RSS feed, or a 'New' section? If so, some scripting around grab-site or httrack could grab them quickly.
-
How are you archiving websites you visit?
After a lot of searching for a similar topic, this is a tool I found which works pretty well: https://github.com/ArchiveTeam/grab-site
-
Help building or mirroring docs.microsoft.com
Crawling is of course the other option. I've seen https://github.com/ArchiveTeam/grab-site in the wiki, but I'm unsure how to host the resulting .warc archives.
- grab-site: The archivist's web crawler: WARC output, dashboard for all crawls, dynamic ignore patterns
- Data hoarders, start backing up government websites and news articles as well
-
How to mirror multiple websites correctly?
It's a completely different tool, but I like using grab-site https://github.com/archiveteam/grab-site . Try --wpull-args=--span-hosts='' or something to make it mirror all subdomains. It outputs in WARC format which can be read with a site like https://replayweb.page.
-
Stack Overflow Developer Story Data Dump (10 whole MB !)
Thusly, as a bit of a statement, here's your "I will do it myself even if I have to bash my head against the wall" collection of the Developer Story on 10-20 top users. I know there are some blogs on old web design, perhaps it might be worth their while as a memento of an era bygone. And as for myself, I am looking into setting up a dedicated server for either grab-site or ArchiveBox. Possibly both!
docker-swag
- Armar mi propio server
-
Guide: Setting up Local DNS WITH PORTS
I have a NAS on .0.181 and a swag container (on a different port than nginx) on .0.180 that points to my public facing services. For obvious reasons, I don't want my public domain to point to any other ports/addresses on my home network. Additionally, as elegant as swag is, it requires authentication and so won't work for simple local DNS. I now have one local domain for each server and an nginx instance on each that resolves to my different services on each.
-
SWAG + Nextcloud AIO + OnlyOffice + Openproject: Fullchain cert connections required. I have the data but I'm not sure how to plug this all together...
OP is even linking the Github... https://github.com/linuxserver/docker-swag
-
Reverse Proxied services not accessible on LAN
I have an UnRAID server with a few services (Jellyfin, Nextcloud, etc.) running on it behind Linux Servers' SWAG reverse proxy container, which is built on Nginx and Let's Encrypt. This is pointed to a DuckDNS link, which is then pointed at my domain with a CNAME. So I can access Jellyfin, for example, at jellyfin.mydomain.com. A few weeks ago, due to seemingly unrelated issues, I got a new modem/router, an Arris SURFboard G34. For the first few weeks, everything was working as before. But now, when on my LAN, I can't get to my services at the proxied domain. It times out every time. There are no errors in SWAG's logs, nothing seems amiss in the router's web interface, and the services are available both at their IP:port address and, when not on my LAN, I can access them at the domain no problem.
- Fail2Ban – Daemon to ban hosts that cause multiple authentication errors
- Mealie and Swag sut issues
- Can't get Swag instance page
- Site marked dangerous
- Reverse proxy, where to start?
-
What's the best way to connect my parent's Roku to my PC, which are on two separate networks?
Reverse proxy, probably? I use Docker SWAG, setup here, with DuckDNS and it works really well for me. There are of course many ways to reverse proxy, as I linked to earlier.
What are some alternatives?
ArchiveBox - 🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more...
Nginx Proxy Manager - Docker container for managing Nginx proxy hosts with a simple, powerful interface
browsertrix-crawler - Run a high-fidelity browser-based crawler in a single Docker container
authentik - The authentication glue you need.
awesome-datahoarding - List of data-hoarding related tools
traefik-examples - docker-compose configurations examples for traefik
wpull - Wget-compatible web downloader and crawler.
oauth2-proxy - A reverse proxy that provides authentication with Google, Azure, OpenID Connect and many more identity providers.
replayweb.page - Serverless replay of web archives directly in the browser
authelia - The Single Sign-On Multi-Factor portal for web apps
docker-templates
Caddy - Fast and extensible multi-platform HTTP/1-2-3 web server with automatic HTTPS