Mastering Web Scraping in Python: Crawling from Scratch

Our great sponsors

WorkOS - The modern identity platform for B2B SaaS

InfluxDB - Power Real-Time Data Analytics at Scale

SaaSHub - Software Alternatives and Reviews

Our great sponsors

node-crawler

3 6,612 4.8 JavaScript

Web Crawler/Spider for NodeJS + server-side jQuery ;-)

Before you write your own library for crawling, try some of the options out there. Many great Open Source libraries can achieve it: Scrapy, pyspider, node-crawler (Node.js), or Colly (Go). And many companies and services that provide you with scraping and crawling solutions.

colly

39 22,120 6.0 Go

Elegant Scraper and Crawler Framework for Golang

Before you write your own library for crawling, try some of the options out there. Many great Open Source libraries can achieve it: Scrapy, pyspider, node-crawler (Node.js), or Colly (Go). And many companies and services that provide you with scraping and crawling solutions.

WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
rq

27 9,503 8.3 Python

Simple job queues for Python

We won't cover the following scale-up step: distributing the crawling process among several servers. Python allows it, and some libraries can help you with it (Celery or Redis Queue). It is a huge step, and we have already covered enough for the day.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project