How to scrape an entire website/all of its content?

This page summarizes the projects mentioned and recommended in the original post on /r/Piracy

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
  • grab-site

    The archivist's web crawler: WARC output, dashboard for all crawls, dynamic ignore patterns

  • take a look at grab-site by ArchiveTeam, it's a very powerful tool for mirroring websites.

  • replayweb.page

    Serverless replay of web archives directly in the browser

  • grab-site will output a compressed WARC file, so you'll need something like this to review the file.

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts