Beautiful Soup: We called him Tortoise because he taught us

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
  • html5-parser

    Fast C based HTML 5 parsing for python

  • You want a proper html 5 parser that can handle non valid documents. And the fastest one is https://github.com/kovidgoyal/html5-parser over 30x faster than html5lib

  • SeleniumBase

    📊 Python's all-in-one framework for web crawling, scraping, testing, and reporting. Supports pytest. UC Mode provides stealth. Includes many tools.

  • In those cases you might want to check out SeleniumBase: https://seleniumbase.io/

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • colly

    Elegant Scraper and Crawler Framework for Golang

  • shot-scraper

    A command-line utility for taking automated screenshots of websites

  • Playwright for Python has really good documentation: https://playwright.dev/python/

    I used it for my https://shot-scraper.datasette.io/ tool, and wrote a bit about CLI-driven scraping using that tool here: https://simonwillison.net/2022/Mar/14/scraping-web-pages-sho...

  • playwright-python

    Python version of the Playwright testing and automation library.

  • Playwright for Python has really good documentation: https://playwright.dev/python/

    I used it for my https://shot-scraper.datasette.io/ tool, and wrote a bit about CLI-driven scraping using that tool here: https://simonwillison.net/2022/Mar/14/scraping-web-pages-sho...

  • soup

    Web Scraper in Go, similar to BeautifulSoup

  • > Does anyone know if there as a good equivalent for Go

    Yes: https://github.com/anaskhan96/soup

    It works well.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts