Where to start: Learning Web-scraping

This page summarizes the projects mentioned and recommended in the original post on /r/learnpython

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
  • Scrapy

    Scrapy, a fast high-level web crawling & scraping framework for Python.

  • The most well-known one (and actually the only one I know) is scrapy. I won't go into too much detail, but among others it offers:

  • lxml

    The lxml XML toolkit for Python

  • lxml is an XML parser however, it also supports HTML parsing. It's blazing fast and supports XPath. I think it isn't as beginner friendly to use, though it has detailed documentation. It works less well with heavily broken HTML documents and the encoding detection isn't as good as the one of BS4.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts