Python web-scraping

Open-source Python projects categorized as web-scraping

Top 23 Python web-scraping Projects

web-scraping
  1. Scrapy

    Scrapy, a fast high-level web crawling & scraping framework for Python.

    Project mention: Top 10 Tools for Efficient Web Scraping in 2025 | dev.to | 2025-01-16

    Scrapy is a robust and scalable open-source web crawling framework. It is highly efficient for large-scale projects and supports asynchronous scraping.

  2. CodeRabbit

    CodeRabbit: AI Code Reviews for Developers. Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR.

    CodeRabbit logo
  3. changedetection.io

    The best and simplest free open source web page change detection, website watcher, restock monitor and notification service. Restock Monitor, change detection. Designed for simplicity - Simply monitor which websites had a text change for free. Free Open source web page change detection, Website defacement monitoring, Price change notification

    Project mention: Top 13 Self-Hosted Projects with the Most GitHub Stars | dev.to | 2024-09-10

    GitHub https://github.com/dgtlmoon/changedetection.io GitHub Star 16.8k GitHub Fork 932 GitHub Issue 199 GitHub Pull Request 30 GitHub Contributor 75 Open Source License Apache-2.0 Official Website https://changedetection.io/ Documentation https://stedolan.github.io/jq/manual/

  4. Douyin_TikTok_Download_API

    🚀「Douyin_TikTok_Download_API」是一个开箱即用的高性能异步抖音、快手、TikTok、Bilibili数据爬取工具,支持API调用,在线批量解析及下载。

  5. SeleniumBase

    Python APIs for web automation, testing, and bypassing bot-detection.

    Project mention: This Week In Python | dev.to | 2024-12-20

    SeleniumBase - Python APIs for web automation, testing, and bypassing bot-detection

  6. autoscraper

    A Smart, Automatic, Fast and Lightweight Web Scraper for Python

  7. crawlee-python

    Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with BeautifulSoup, Playwright, and raw HTTP. Both headful and headless mode. With proxy rotation.

    Project mention: How to scrape Crunchbase using Python in 2024 (Easy Guide) | dev.to | 2025-01-15

    By the end of this blog, we'll explore three different ways to extract data from Crunchbase using Crawlee for Python. We'll fully implement two of them and discuss the specifics and challenges of the third. This will help us better understand how important it is to properly choose the right data source.

  8. trafilatura

    Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML

    Project mention: Claude is now available in Europe | news.ycombinator.com | 2024-05-14
  9. SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  10. snoop

    Snoop — инструмент разведки на основе открытых данных (OSINT world)

  11. curl_cffi

    Python binding for curl-impersonate fork via cffi. A http client that can impersonate browser tls/ja3/http2 fingerprints.

    Project mention: Curl-Impersonate | news.ycombinator.com | 2024-12-30

    https://github.com/lexiforest/curl_cffi/releases/expanded_as...

  12. Grab

    Web Scraping Framework

  13. Scrapling

    🕷️ Undetectable, Lightning-Fast, and Adaptive Web Scraping for Python

    Project mention: This Week In Python | dev.to | 2024-11-01

    Scrapling – A simple web scraping tool for Python

  14. web-scraping

    Detailed web scraping tutorials for dummies with financial data crawlers on Reddit WallStreetBets, CME (both options and futures), US Treasury, CFTC, LME, MacroTrends, SHFE and alternative data crawlers on Tomtom, BBC, Wall Street Journal, Al Jazeera, Reuters, Financial Times, Bloomberg, CNN, Fortune, The Economist

  15. scrapy-fake-useragent

    Random User-Agent middleware based on fake-useragent

  16. google-search-results-python

    Google Search Results via SERP API pip Python Package

  17. agentql

    AgentQL is an AI-powered query language for web scraping and automation. It uses natural language selectors to find data on any page, including authenticated content. AgentQL queries are self-healing as UI changes and work across similar sites. Users can define structured data output, making AgentQL versatile for developers and data scientists.

    Project mention: AgentQL Launch Week Recap—make the web AI-ready | dev.to | 2024-11-18

    We upgraded Stealth Mode to minimize bot detection when scraping or automating actions on third-party websites. Check out the launch post, this handy guide to Avoiding Bot Detection with Stealth Mode, or implement it today with our Stealth Mode Example Script (now in JavaScript, too!)

  18. wayback-machine-scraper

    A command-line utility and Scrapy middleware for scraping time series data from Archive.org's Wayback Machine.

  19. dude

    dude uncomplicated data extraction: A simple framework for writing web scrapers using Python decorators

  20. twitter-scraper-selenium

    Python's package to scrap Twitter's front-end easily

  21. letterboxd_recommendations

    Scraping publicly-accessible Letterboxd data and creating a movie recommendation model with it that can generate recommendations when provided with a Letterboxd username

  22. facebook_page_scraper

    Scrapes facebook's pages front end with no limitations & provides a feature to turn data into structured JSON or CSV

  23. scrapper

    Web scraper with a simple REST API living in Docker and using a Headless browser and Readability.js for parsing.

  24. estela

    estela, an elastic web scraping cluster 🕸

  25. saveddit

    Bulk Downloader for Reddit

  26. SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python web-scraping discussion

Log in or Post with

Python web-scraping related posts

  • How to scrape Crunchbase using Python in 2024 (Easy Guide)

    5 projects | dev.to | 15 Jan 2025
  • This Week In Python

    5 projects | dev.to | 10 Jan 2025
  • This Week In Python

    5 projects | dev.to | 1 Nov 2024
  • Show HN: Crawlee for Python – a web scraping and browser automation library

    5 projects | news.ycombinator.com | 9 Jul 2024
  • Announcing Crawlee Python: Now you can use Python to build reliable web crawlers

    4 projects | dev.to | 9 Jul 2024
  • Tool to analyse leetcode compensations (India: Jan-Jul'24)

    1 project | news.ycombinator.com | 12 Jun 2024
  • Claude is now available in Europe

    2 projects | news.ycombinator.com | 14 May 2024
  • A note from our sponsor - SaaSHub
    www.saashub.com | 9 Feb 2025
    SaaSHub helps you find the best software and product alternatives Learn more →

Index

What are some of the best open-source web-scraping projects in Python? This list will help you:

# Project Stars
1 Scrapy 54,040
2 changedetection.io 21,461
3 Douyin_TikTok_Download_API 10,728
4 SeleniumBase 9,255
5 autoscraper 6,611
6 crawlee-python 5,221
7 trafilatura 3,901
8 snoop 3,164
9 curl_cffi 2,962
10 Grab 2,401
11 Scrapling 1,877
12 web-scraping 762
13 scrapy-fake-useragent 689
14 google-search-results-python 627
15 agentql 465
16 wayback-machine-scraper 430
17 dude 426
18 twitter-scraper-selenium 334
19 letterboxd_recommendations 281
20 facebook_page_scraper 248
21 scrapper 201
22 estela 175
23 saveddit 174

Sponsored
CodeRabbit: AI Code Reviews for Developers
Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR.
coderabbit.ai

Did you know that Python is
the 2nd most popular programming language
based on number of references?