Our great sponsors
-
changedetection.io
The best and simplest free open source web page change detection, website watcher, restock monitor and notification service. Restock Monitor, change detection. Designed for simplicity - Simply monitor which websites had a text change for free. Free Open source web page change detection, Website defacement monitoring, Price change notification
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
1. The precedent (so far) is scraping is legal if the scraped data is publicly available[A].
2. I guess the best approach depends on what data you're scraping. Some data it's fine to first convert to plain text, then scrape scrape that.
For structured data like tables and HTML, you're better off using the structure of the HTML itself.
I suppose you could design a framework that covers all the common tasks, then feed the framework parameters for each site.
It's not just handling different sites: the same site will change over time, and there will be oddities between pages/items on the same site.
[A]: https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que...
You can use an open-source tool like this one: https://github.com/dgtlmoon/changedetection.io
Related posts
- Tell me your most exotic selfhosted solution, the crazier, the better, no 0815 solutions!
- 0.45.8 is out! new release :)
- free alternative for changedetection.io
- Where to get short 'flashy product intro video' made? is there a service?
- changedetection.io releases version 0.45.3! (Whats new since our last Reddit update in 0.42!)