Our great sponsors
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
ArchiveBox
🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more...
If you need more advanced recursive spider/crawling ability beyond --depth=1, check out Browsertrix, Photon, or Scrapy and pipe the outputted URLs into ArchiveBox.
If you need more advanced recursive spider/crawling ability beyond --depth=1, check out Browsertrix, Photon, or Scrapy and pipe the outputted URLs into ArchiveBox.
If you need more advanced recursive spider/crawling ability beyond --depth=1, check out Browsertrix, Photon, or Scrapy and pipe the outputted URLs into ArchiveBox.
From https://archivebox.io/:
Related posts
- Web Scraper Multiparadigmático!
- Best (simple) tool for personal Wiki
- What are Your favorite tools to backup reddit data? (Text Posts, Media Content, Comments..)
- Are there any efficient methods available to recursively download (nearly) all pages of a game's wiki to a single PDF file?
- Wayback Machine Downloader – Download an Entire Website from the Wayback Machine