Scaling-to-distributed-crawling Alternatives

Similar projects and alternatives to scaling-to-distributed-crawling

Angular

699 94,541 10.0 TypeScript scaling-to-distributed-crawling VS Angular

Deliver web apps with confidence 🚀
Redis

318 64,821 9.7 C scaling-to-distributed-crawling VS Redis

Redis is an in-memory database that persists on disk. The data model is key-value, but many different kind of values are supported: Strings, Lists, Sets, Sorted Sets, Hashes, Streams, HyperLogLogs, Bitmaps.
InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
celery

43 23,498 9.5 Python scaling-to-distributed-crawling VS celery

Distributed Task Queue (development branch)
colly

39 22,165 6.0 Go scaling-to-distributed-crawling VS colly

Elegant Scraper and Crawler Framework for Golang
Scrapy

180 50,896 9.6 Python scaling-to-distributed-crawling VS Scrapy

Scrapy, a fast high-level web crawling & scraping framework for Python.
newspaper

13 13,720 0.0 Python scaling-to-distributed-crawling VS newspaper

newspaper3k is a news, full-text, and article metadata extraction in Python 3. Advanced docs:
PeARS-orchard

1 35 0.0 HTML scaling-to-distributed-crawling VS PeARS-orchard

This is the development version of PeARS, the people's search engine. More compact but less robust than PeARS-lite. If you just want to use PeARS as a local indexer, use PeARS-lite instead.
WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
storm-crawler

0 855 8.8 HTML scaling-to-distributed-crawling VS storm-crawler

A scalable, mature and versatile web crawler based on Apache Storm
Crawly

2 840 6.6 Elixir scaling-to-distributed-crawling VS Crawly

Crawly, a high-level web crawling & scraping framework for Elixir.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a better scaling-to-distributed-crawling alternative or higher similarity.

Suggest an alternative to scaling-to-distributed-crawling

scaling-to-distributed-crawling reviews and mentions

Posts with mentions or reviews of scaling-to-distributed-crawling. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2021-12-21.

DOs and DON'Ts of Web Scraping
2 projects | dev.to | 21 Dec 2021

We published a repository and blog post about distributed crawling in Python. It is a bit more complicated than what we've seen so far. It uses external software (Celery for asynchronous task queue and Redis as the database).
Mastering Web Scraping in Python: Scaling to Distributed Crawling - ZenRows
1 project | /r/programming | 7 Sep 2021
Mastering Web Scraping in Python: Scaling to Distributed Crawling – ZenRows
1 project | news.ycombinator.com | 7 Sep 2021
Mastering Web Scraping in Python: Scaling to Distributed Crawling
1 project | news.ycombinator.com | 25 Aug 2021

3 projects | dev.to | 25 Aug 2021

We will start to separate concepts before the project grows. We already have two files: tasks.py and main.py. We will create another two to host crawler-related functions (crawler.py) and database access (repo.py). Please look at the snippet below for the repo file, it is not complete, but you get the idea. There is a GitHub repository with the final content in case you want to check it.
A note from our sponsor - WorkOS
workos.com | 29 Apr 2024

The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning. Learn more →