Mastering Web Scraping in Python: Crawling from Scratch

This page summarizes the projects mentioned and recommended in the original post on dev.to

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
  • node-crawler

    Web Crawler/Spider for NodeJS + server-side jQuery ;-)

  • Before you write your own library for crawling, try some of the options out there. Many great Open Source libraries can achieve it: Scrapy, pyspider, node-crawler (Node.js), or Colly (Go). And many companies and services that provide you with scraping and crawling solutions.

  • colly

    Elegant Scraper and Crawler Framework for Golang

  • Before you write your own library for crawling, try some of the options out there. Many great Open Source libraries can achieve it: Scrapy, pyspider, node-crawler (Node.js), or Colly (Go). And many companies and services that provide you with scraping and crawling solutions.

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • rq

    Simple job queues for Python

  • We won't cover the following scale-up step: distributing the crawling process among several servers. Python allows it, and some libraries can help you with it (Celery or Redis Queue). It is a huge step, and we have already covered enough for the day.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts