Web Scraping With Python (An Ultimate Guide)

This page summarizes the projects mentioned and recommended in the original post on /r/Python

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
  • parsel

    Parsel lets you extract data from XML/HTML documents using XPath or CSS selectors

    Something I don't see discussed when this topic is brought up is that Scrapy's HTML parsing library, parsel, can be installed separately from scrapy itself. You can use it in place of beautifulsoup and, imo, it's much easier to use.

  • Playwright

    Playwright is a framework for Web Testing and Automation. It allows testing Chromium, Firefox and WebKit with a single API.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

  • parsel-cli

    cli for evaluating css and xpath selectors

    I like it so much that I even wrote a REPL for it parsel-cli :) (it's a bit of a Frankenstein though as I'm working on a 2.0 release)

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts