Beautiful Soup: We called him Tortoise because he taught us

Our great sponsors

WorkOS - The modern identity platform for B2B SaaS

InfluxDB - Power Real-Time Data Analytics at Scale

SaaSHub - Software Alternatives and Reviews

Our great sponsors

html5-parser

2 666 6.3 C

Fast C based HTML 5 parsing for python

You want a proper html 5 parser that can handle non valid documents. And the fastest one is https://github.com/kovidgoyal/html5-parser over 30x faster than html5lib

SeleniumBase

9 4,215 9.8 Python

📊 Python's all-in-one framework for web crawling, scraping, testing, and reporting. Supports pytest. UC Mode provides stealth. Includes many tools.

In those cases you might want to check out SeleniumBase: https://seleniumbase.io/

WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
colly

39 22,165 6.0 Go

Elegant Scraper and Crawler Framework for Golang
shot-scraper

16 1,531 7.1 Python

A command-line utility for taking automated screenshots of websites

Playwright for Python has really good documentation: https://playwright.dev/python/
I used it for my https://shot-scraper.datasette.io/ tool, and wrote a bit about CLI-driven scraping using that tool here: https://simonwillison.net/2022/Mar/14/scraping-web-pages-sho...

playwright-python

31 10,675 9.0 Python

Python version of the Playwright testing and automation library.

Playwright for Python has really good documentation: https://playwright.dev/python/
I used it for my https://shot-scraper.datasette.io/ tool, and wrote a bit about CLI-driven scraping using that tool here: https://simonwillison.net/2022/Mar/14/scraping-web-pages-sho...

soup

4 2,126 0.0 Go

Web Scraper in Go, similar to BeautifulSoup

> Does anyone know if there as a good equivalent for Go
Yes: https://github.com/anaskhan96/soup
It works well.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Show HN: Flyscrape – A standalone and scriptable web scraper in Go
6 projects | news.ycombinator.com | 11 Nov 2023
New modern web crawling tool
2 projects | news.ycombinator.com | 30 Apr 2023
No code command line webscraper
3 projects | /r/webscraping | 9 Mar 2023
Go for web scraping
5 projects | /r/golang | 18 Nov 2022
Dan terjadi lagi
3 projects | /r/indonesia | 16 Oct 2022

Beautiful Soup: We called him Tortoise because he taught us

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
Golang Go Crawler Pytest Playwright
Post date: 8 Jun 2022

html5-parser

SeleniumBase

WorkOS

colly

shot-scraper

playwright-python

soup

Related posts

Beautiful Soup: We called him Tortoise because he taught us

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com Golang Go Crawler Pytest Playwright Post date: 8 Jun 2022

html5-parser

SeleniumBase

WorkOS

colly

shot-scraper

playwright-python

soup

Related posts

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
Golang Go Crawler Pytest Playwright
Post date: 8 Jun 2022