Self-hosted web scraper?

Our great sponsors

InfluxDB - Power Real-Time Data Analytics at Scale

WorkOS - The modern identity platform for B2B SaaS

SaaSHub - Software Alternatives and Reviews

Our great sponsors

Huginn

121 41,523 7.2 Ruby

Create agents that monitor and act on your behalf. Your agents are standing by!

You didn't say what features are important or what about changedetection.io didn't work for you, but maybe ArchiveBox or Huginn

ArchiveBox

248 19,737 9.7 Python

🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more...

You didn't say what features are important or what about changedetection.io didn't work for you, but maybe ArchiveBox or Huginn

InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
Trilium Notes

278 25,378 9.6 JavaScript

Build your personal knowledge base with Trilium Notes

If you want to just scrape words, images and the formatting on a web page, you can use trilium notes along with their web clipper browser plugin. With the web clipper plugin you can copy the whole page as it is, images an all to your local trilium instance.

crawlab

4 10,788 6.8 Go

Distributed web crawler admin platform for spiders management regardless of languages and frameworks. 分布式爬虫管理平台，支持任何语言和框架

Haven't tried but this project https://github.com/crawlab-team/crawlab looks promising.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project