scaling-to-distributed-crawling VS colly

Compare scaling-to-distributed-crawling vs colly and see what are their differences.

scaling-to-distributed-crawling

Repository for the Mastering Web Scraping in Python: Scaling to Distributed Crawling blogpost with the final code. (by ZenRows)
Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
scaling-to-distributed-crawling colly
5 39
36 22,165
- 1.8%
0.0 6.0
over 2 years ago 10 days ago
HTML Go
MIT License Apache License 2.0
The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

scaling-to-distributed-crawling

Posts with mentions or reviews of scaling-to-distributed-crawling. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2021-12-21.

colly

Posts with mentions or reviews of colly. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2024-01-01.

What are some alternatives?

When comparing scaling-to-distributed-crawling and colly you can also consider the following projects:

celery - Distributed Task Queue (development branch)

GoQuery - A little like that j-thing, only in Go.

Scrapy - Scrapy, a fast high-level web crawling & scraping framework for Python.

Redis - Redis is an in-memory database that persists on disk. The data model is key-value, but many different kind of values are supported: Strings, Lists, Sets, Sorted Sets, Hashes, Streams, HyperLogLogs, Bitmaps.

xpath - XPath package for Golang, supports HTML, XML, JSON document query.

newspaper - newspaper3k is a news, full-text, and article metadata extraction in Python 3. Advanced docs:

rod - A Devtools driver for web automation and scraping

PeARS-orchard - This is the development version of PeARS, the people's search engine. More compact but less robust than PeARS-lite. If you just want to use PeARS as a local indexer, use PeARS-lite instead.

Geziyor - Geziyor, blazing fast web crawling & scraping framework for Go. Supports JS rendering.

storm-crawler - A scalable, mature and versatile web crawler based on Apache Storm

Ferret - Declarative web scraping