Where to start: Learning Web-scraping

This page summarizes the projects mentioned and recommended in the original post on reddit.com/r/learnpython

Our great sponsors
  • Sonar - Write Clean Python Code. Always.
  • InfluxDB - Access the most powerful time series database as a service
  • SaaSHub - Software Alternatives and Reviews
  • Scrapy

    Scrapy, a fast high-level web crawling & scraping framework for Python.

    The most well-known one (and actually the only one I know) is scrapy. I won't go into too much detail, but among others it offers:

  • lxml

    The lxml XML toolkit for Python

    lxml is an XML parser however, it also supports HTML parsing. It's blazing fast and supports XPath. I think it isn't as beginner friendly to use, though it has detailed documentation. It works less well with heavily broken HTML documents and the encoding detection isn't as good as the one of BS4.

  • Sonar

    Write Clean Python Code. Always.. Sonar helps you commit clean code every time. With over 225 unique rules to find Python bugs, code smells & vulnerabilities, Sonar finds the issues while you focus on the work.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts