Anyone Experienced with Crawling Websites?

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews

    1. The precedent (so far) is scraping is legal if the scraped data is publicly available[A].

    2. I guess the best approach depends on what data you're scraping. Some data it's fine to first convert to plain text, then scrape scrape that.

    For structured data like tables and HTML, you're better off using the structure of the HTML itself.

    I suppose you could design a framework that covers all the common tasks, then feed the framework parameters for each site.

    It's not just handling different sites: the same site will change over time, and there will be oddities between pages/items on the same site.

    [A]: https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que...

  • changedetection.io

    The best and simplest free open source web page change detection, website watcher, restock monitor and notification service. Restock Monitor, change detection. Designed for simplicity - Simply monitor which websites had a text change for free. Free Open source web page change detection, Website defacement monitoring, Price change notification

  • You can use an open-source tool like this one: https://github.com/dgtlmoon/changedetection.io

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts