Where to start: Learning Web-scraping

Our great sponsors

InfluxDB - Power Real-Time Data Analytics at Scale

WorkOS - The modern identity platform for B2B SaaS

SaaSHub - Software Alternatives and Reviews

Our great sponsors

Scrapy

180 50,824 9.6 Python

Scrapy, a fast high-level web crawling & scraping framework for Python.

The most well-known one (and actually the only one I know) is scrapy. I won't go into too much detail, but among others it offers:

lxml

17 2,567 9.5 Python

The lxml XML toolkit for Python

lxml is an XML parser however, it also supports HTML parsing. It's blazing fast and supports XPath. I think it isn't as beginner friendly to use, though it has detailed documentation. It works less well with heavily broken HTML documents and the encoding detection isn't as good as the one of BS4.

InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project