Python web-scraping

Open-source Python projects categorized as web-scraping

Top 23 Python web-scraping Projects

web-scraping
  1. Scrapy

    Scrapy, a fast high-level web crawling & scraping framework for Python.

    Project mention: How to write and publish a Python package to PyPI | dev.to | 2026-05-11

    This guide walks through the full process using uv, a fast, modern Python toolchain that replaces pip, virtualenv, pip-tools, twine, and build with a single tool. We will write a reusable Scrapy download handler, structure it as a proper Python package, test it, and publish it to PyPI.

  2. SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  3. Scrapling

    🕷️ An adaptive Web Scraping framework that handles everything from a single request to a full-scale crawl!

    Project mention: Launch HN: Intuned (YC S22) – Build and run reliable browser automations as code | news.ycombinator.com | 2026-06-08

    What is the advantage of your product over having Codex generate a script using something like https://github.com/D4Vinci/Scrapling?

  4. changedetection.io

    Best and simplest tool for website change detection, web page monitoring, and website change alerts. Perfect for tracking content changes, price drops, restock alerts, and website defacement monitoring—all for free or enjoy our SaaS plan!

    Project mention: Show HN: Get a webhook the moment a webpage changes | news.ycombinator.com | 2026-05-29

    Interesting, but how is this different from ChangeDetection (https://github.com/dgtlmoon/changedetection.io)?

    I've used ChangeDetection for years and it's really solid I would say. Perhaps I'm missing something?

  5. Scrapegraph-ai

    Python scraper based on AI

    Project mention: ScrapeGraphAI Release Week | news.ycombinator.com | 2025-07-07
  6. Douyin_TikTok_Download_API

    🚀「Douyin_TikTok_Download_API」是一个开箱即用的高性能异步抖音、快手、TikTok、Bilibili数据爬取工具,支持API调用,在线批量解析及下载。

  7. Skill_Seekers

    Convert documentation websites, GitHub repositories, and PDFs into Claude AI skills with automatic conflict detection

    Project mention: Turn Docs, Code and PDFs into Claude AI Skills in Minutes | news.ycombinator.com | 2025-11-07
  8. SeleniumBase

    📊 Python's all-in-one framework for web crawling, scraping, and testing. Supports pytest. CDP Mode provides stealth. Includes many tools.

    Project mention: Scraping German Rental Price Data – Part I: Whole Lotta Captchas | news.ycombinator.com | 2025-07-29

    Not yet! But it's on my list to try out next after giving SeleniumBase[1] a chance.

    [1] https://github.com/seleniumbase/SeleniumBase

  9. crawlee-python

    Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Parsel, BeautifulSoup, Playwright, and raw HTTP. Both headful and headless mode. With proxy rotation.

    Project mention: Launching Crawlee for Python v1.0 to simplify building web scrapers and crawlers | news.ycombinator.com | 2025-09-30
  10. autoscraper

    A Smart, Automatic, Fast and Lightweight Web Scraper for Python

  11. pydoll

    Pydoll is a library for automating chromium-based browsers without a WebDriver, offering realistic interactions.

    Project mention: Turn Any Website into an API | news.ycombinator.com | 2025-08-08

    It's not a browser extension, but controlling the actual browser without using webdriver is already a thing.

    https://github.com/autoscrape-labs/pydoll

  12. trafilatura

    Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML

  13. curl_cffi

    Python binding for curl-impersonate fork via cffi. A http client that can impersonate browser tls/ja3/http2 fingerprints.

    Project mention: Bing Search API Replacement: scrape SERP results for $1.05/1K | dev.to | 2026-05-31

    1. TLS fingerprint inspection. Bing inspects the JA3/JA4 signature of your TLS handshake. Python's stdlib ssl, requests, and httpx emit fingerprints no real browser produces — the server returns 403 before reading the query string. We route every request through curl-cffi's AsyncSession, impersonating Chrome 131, Chrome 124, or Firefox 147 TLS + HTTP/2 SETTINGS frames at the socket level, rotating profiles per page to reduce burst correlation.

  14. snoop

    Snoop — инструмент разведки на основе открытых данных (OSINT world)

    Project mention: Snoop Project Update (search for usernames on 5k websites) | news.ycombinator.com | 2026-01-01
  15. Grab

    Web Scraping Framework

  16. CloakBrowser

    Stealth Chromium that passes every bot detection test. Drop-in Playwright replacement with source-level fingerprint patches. 30/30 tests passed.

    Project mention: CAPTCHAs can still detect AI agents | news.ycombinator.com | 2026-05-29

    CAPTCHAs are great. Exploiters get around them with proprietary anti-detect browsers and unethical residential proxies, while privacy browsers and affordable privacy VPNs get blocked and shadowbanned to death.

    Fingerprint.com, while not a CAPTCHA, gives you +3 suspicious score for using privacy settings like adblock on your browser.

    https://github.com/CloakHQ/CloakBrowser is a good anti-detect browser as well as CAPTCHA bypass.

  17. agentql

    AgentQL is a suite of tools for connecting your AI to the web. Featuring a query language and Playwright integrations for interacting with elements and extracting data quickly, precisely, and at scale. Includes REST API, Python and JavaScript SDKs, browser debugger.

  18. wreq-python

    An ergonomic Python HTTP Client with TLS fingerprint

    Project mention: Hybrid scraping: The architecture for the modern web | dev.to | 2026-02-25

    Python’s requests package, which uses urllib from the standard library, has a very distinctive TLS fingerprint, containing ciphers (amongst other things) that aren’t seen in a browser. This makes it very easy to spot. Both rnet, and other options such as curl-cffi, are able to send a TLS fingerprint similar to that of a browser. This reduces the chances of our request being blocked.

  19. scrapfly-scrapers

    Scalable Python web scraping scripts for +40 popular domains

    Project mention: 5 Best Free Web Scraping Tools in 2026 | dev.to | 2026-04-30

    Instead of trying to bypass blocks manually, you send a request to their API and let it deal with proxies, headers, browser rendering, and fingerprinting. Scrapfly solves this by managing the infrastructure for you.

  20. web-scraping

    Detailed web scraping tutorials for dummies with financial data crawlers on Reddit WallStreetBets, CME (both options and futures), US Treasury, CFTC, LME, MacroTrends, SHFE and alternative data crawlers on Tomtom, BBC, Wall Street Journal, Al Jazeera, Reuters, Financial Times, Bloomberg, CNN, Fortune, The Economist

  21. google-search-results-python

    Google Search Results via SERP API pip Python Package

  22. scrapy-fake-useragent

    Random User-Agent middleware based on fake-useragent

  23. stealth-browser-mcp

    The only browser automation that bypasses anti-bot systems. AI writes network hooks, clones UIs pixel-perfect via simple chat.

    Project mention: Show HN: Vibium – Browser automation for AI and humans, by Selenium's creator | news.ycombinator.com | 2025-12-24

    Maybe this is something: https://github.com/vibheksoni/stealth-browser-mcp

  24. invisible_playwright

    Stealth Firefox that passes every bot detection test. Drop-in Playwright replacement.

    Project mention: Stealth Firefox that passes every bot detection test | news.ycombinator.com | 2026-05-25
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python web-scraping discussion

Log in or Post with

Python web-scraping related posts

  • Launch HN: Intuned (YC S22) – Build and run reliable browser automations as code

    1 project | news.ycombinator.com | 8 Jun 2026
  • I Tested Every Web Scraping Tool Against Lazada — Here's What Actually Works (May 2026)

    1 project | dev.to | 29 May 2026
  • How to write and publish a Python package to PyPI

    3 projects | dev.to | 11 May 2026
  • 5 Best Free Web Scraping Tools in 2026

    1 project | dev.to | 30 Apr 2026
  • Show HN: An event loop for asyncio written in Rust

    3 projects | news.ycombinator.com | 21 Mar 2026
  • Web Adapter Tool Agent: Turn Self-Learning Skills into "98% Average Token Reduction on Revisits," Measured

    7 projects | dev.to | 8 Mar 2026
  • Scrapling: BeautifulSoup보다 784배 빠른 Python 웹 스크래핑 프레임워크를 써봤습니다

    1 project | dev.to | 7 Mar 2026
  • A note from our sponsor - SaaSHub
    www.saashub.com | 9 Jun 2026
    SaaSHub helps you find the best software and product alternatives Learn more →

Index

What are some of the best open-source web-scraping projects in Python? This list will help you:

# Project Stars
1 Scrapy 62,120
2 Scrapling 60,673
3 changedetection.io 31,876
4 Scrapegraph-ai 26,735
5 Douyin_TikTok_Download_API 18,183
6 Skill_Seekers 13,941
7 SeleniumBase 12,767
8 crawlee-python 9,141
9 autoscraper 7,180
10 pydoll 6,888
11 trafilatura 6,056
12 curl_cffi 5,743
13 snoop 3,940
14 Grab 2,460
15 CloakBrowser 2,249
16 agentql 1,392
17 wreq-python 1,367
18 scrapfly-scrapers 996
19 web-scraping 876
20 google-search-results-python 741
21 scrapy-fake-useragent 689
22 stealth-browser-mcp 682
23 invisible_playwright 523

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com

Did you know that Python is
the 1st most popular programming language
based on number of references?