SaaSHub helps you find the best software and product alternatives Learn more →
Top 23 Python web-scraping Projects
-
This guide walks through the full process using uv, a fast, modern Python toolchain that replaces pip, virtualenv, pip-tools, twine, and build with a single tool. We will write a reusable Scrapy download handler, structure it as a proper Python package, test it, and publish it to PyPI.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
Scrapling
🕷️ An adaptive Web Scraping framework that handles everything from a single request to a full-scale crawl!
Project mention: Launch HN: Intuned (YC S22) – Build and run reliable browser automations as code | news.ycombinator.com | 2026-06-08What is the advantage of your product over having Codex generate a script using something like https://github.com/D4Vinci/Scrapling?
-
changedetection.io
Best and simplest tool for website change detection, web page monitoring, and website change alerts. Perfect for tracking content changes, price drops, restock alerts, and website defacement monitoring—all for free or enjoy our SaaS plan!
Project mention: Show HN: Get a webhook the moment a webpage changes | news.ycombinator.com | 2026-05-29Interesting, but how is this different from ChangeDetection (https://github.com/dgtlmoon/changedetection.io)?
I've used ChangeDetection for years and it's really solid I would say. Perhaps I'm missing something?
-
-
Douyin_TikTok_Download_API
🚀「Douyin_TikTok_Download_API」是一个开箱即用的高性能异步抖音、快手、TikTok、Bilibili数据爬取工具,支持API调用,在线批量解析及下载。
-
Skill_Seekers
Convert documentation websites, GitHub repositories, and PDFs into Claude AI skills with automatic conflict detection
Project mention: Turn Docs, Code and PDFs into Claude AI Skills in Minutes | news.ycombinator.com | 2025-11-07 -
SeleniumBase
📊 Python's all-in-one framework for web crawling, scraping, and testing. Supports pytest. CDP Mode provides stealth. Includes many tools.
Project mention: Scraping German Rental Price Data – Part I: Whole Lotta Captchas | news.ycombinator.com | 2025-07-29Not yet! But it's on my list to try out next after giving SeleniumBase[1] a chance.
[1] https://github.com/seleniumbase/SeleniumBase
-
crawlee-python
Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Parsel, BeautifulSoup, Playwright, and raw HTTP. Both headful and headless mode. With proxy rotation.
Project mention: Launching Crawlee for Python v1.0 to simplify building web scrapers and crawlers | news.ycombinator.com | 2025-09-30 -
-
pydoll
Pydoll is a library for automating chromium-based browsers without a WebDriver, offering realistic interactions.
It's not a browser extension, but controlling the actual browser without using webdriver is already a thing.
https://github.com/autoscrape-labs/pydoll
-
trafilatura
Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML
-
curl_cffi
Python binding for curl-impersonate fork via cffi. A http client that can impersonate browser tls/ja3/http2 fingerprints.
Project mention: Bing Search API Replacement: scrape SERP results for $1.05/1K | dev.to | 2026-05-311. TLS fingerprint inspection. Bing inspects the JA3/JA4 signature of your TLS handshake. Python's stdlib ssl, requests, and httpx emit fingerprints no real browser produces — the server returns 403 before reading the query string. We route every request through curl-cffi's AsyncSession, impersonating Chrome 131, Chrome 124, or Firefox 147 TLS + HTTP/2 SETTINGS frames at the socket level, rotating profiles per page to reduce burst correlation.
-
Project mention: Snoop Project Update (search for usernames on 5k websites) | news.ycombinator.com | 2026-01-01
-
-
CloakBrowser
Stealth Chromium that passes every bot detection test. Drop-in Playwright replacement with source-level fingerprint patches. 30/30 tests passed.
CAPTCHAs are great. Exploiters get around them with proprietary anti-detect browsers and unethical residential proxies, while privacy browsers and affordable privacy VPNs get blocked and shadowbanned to death.
Fingerprint.com, while not a CAPTCHA, gives you +3 suspicious score for using privacy settings like adblock on your browser.
https://github.com/CloakHQ/CloakBrowser is a good anti-detect browser as well as CAPTCHA bypass.
-
agentql
AgentQL is a suite of tools for connecting your AI to the web. Featuring a query language and Playwright integrations for interacting with elements and extracting data quickly, precisely, and at scale. Includes REST API, Python and JavaScript SDKs, browser debugger.
-
Python’s requests package, which uses urllib from the standard library, has a very distinctive TLS fingerprint, containing ciphers (amongst other things) that aren’t seen in a browser. This makes it very easy to spot. Both rnet, and other options such as curl-cffi, are able to send a TLS fingerprint similar to that of a browser. This reduces the chances of our request being blocked.
-
Instead of trying to bypass blocks manually, you send a request to their API and let it deal with proxies, headers, browser rendering, and fingerprinting. Scrapfly solves this by managing the infrastructure for you.
-
web-scraping
Detailed web scraping tutorials for dummies with financial data crawlers on Reddit WallStreetBets, CME (both options and futures), US Treasury, CFTC, LME, MacroTrends, SHFE and alternative data crawlers on Tomtom, BBC, Wall Street Journal, Al Jazeera, Reuters, Financial Times, Bloomberg, CNN, Fortune, The Economist
-
-
-
stealth-browser-mcp
The only browser automation that bypasses anti-bot systems. AI writes network hooks, clones UIs pixel-perfect via simple chat.
Project mention: Show HN: Vibium – Browser automation for AI and humans, by Selenium's creator | news.ycombinator.com | 2025-12-24Maybe this is something: https://github.com/vibheksoni/stealth-browser-mcp
-
invisible_playwright
Stealth Firefox that passes every bot detection test. Drop-in Playwright replacement.
Project mention: Stealth Firefox that passes every bot detection test | news.ycombinator.com | 2026-05-25
Python web-scraping discussion
Python web-scraping related posts
-
Launch HN: Intuned (YC S22) – Build and run reliable browser automations as code
-
I Tested Every Web Scraping Tool Against Lazada — Here's What Actually Works (May 2026)
-
How to write and publish a Python package to PyPI
-
5 Best Free Web Scraping Tools in 2026
-
Show HN: An event loop for asyncio written in Rust
-
Web Adapter Tool Agent: Turn Self-Learning Skills into "98% Average Token Reduction on Revisits," Measured
-
Scrapling: BeautifulSoup보다 784배 빠른 Python 웹 스크래핑 프레임워크를 써봤습니다
-
A note from our sponsor - SaaSHub
www.saashub.com | 9 Jun 2026
Index
What are some of the best open-source web-scraping projects in Python? This list will help you:
| # | Project | Stars |
|---|---|---|
| 1 | Scrapy | 62,120 |
| 2 | Scrapling | 60,673 |
| 3 | changedetection.io | 31,876 |
| 4 | Scrapegraph-ai | 26,735 |
| 5 | Douyin_TikTok_Download_API | 18,183 |
| 6 | Skill_Seekers | 13,941 |
| 7 | SeleniumBase | 12,767 |
| 8 | crawlee-python | 9,141 |
| 9 | autoscraper | 7,180 |
| 10 | pydoll | 6,888 |
| 11 | trafilatura | 6,056 |
| 12 | curl_cffi | 5,743 |
| 13 | snoop | 3,940 |
| 14 | Grab | 2,460 |
| 15 | CloakBrowser | 2,249 |
| 16 | agentql | 1,392 |
| 17 | wreq-python | 1,367 |
| 18 | scrapfly-scrapers | 996 |
| 19 | web-scraping | 876 |
| 20 | google-search-results-python | 741 |
| 21 | scrapy-fake-useragent | 689 |
| 22 | stealth-browser-mcp | 682 |
| 23 | invisible_playwright | 523 |