SaaSHub helps you find the best software and product alternatives Learn more →
Top 23 Python web-scraping Projects
-
Scrapy is a robust and scalable open-source web crawling framework. It is highly efficient for large-scale projects and supports asynchronous scraping.
-
CodeRabbit
CodeRabbit: AI Code Reviews for Developers. Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR.
-
changedetection.io
The best and simplest free open source web page change detection, website watcher, restock monitor and notification service. Restock Monitor, change detection. Designed for simplicity - Simply monitor which websites had a text change for free. Free Open source web page change detection, Website defacement monitoring, Price change notification
GitHub https://github.com/dgtlmoon/changedetection.io GitHub Star 16.8k GitHub Fork 932 GitHub Issue 199 GitHub Pull Request 30 GitHub Contributor 75 Open Source License Apache-2.0 Official Website https://changedetection.io/ Documentation https://stedolan.github.io/jq/manual/
-
Douyin_TikTok_Download_API
🚀「Douyin_TikTok_Download_API」是一个开箱即用的高性能异步抖音、快手、TikTok、Bilibili数据爬取工具,支持API调用,在线批量解析及下载。
-
SeleniumBase - Python APIs for web automation, testing, and bypassing bot-detection
-
-
crawlee-python
Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with BeautifulSoup, Playwright, and raw HTTP. Both headful and headless mode. With proxy rotation.
By the end of this blog, we'll explore three different ways to extract data from Crunchbase using Crawlee for Python. We'll fully implement two of them and discuss the specifics and challenges of the third. This will help us better understand how important it is to properly choose the right data source.
-
trafilatura
Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
-
curl_cffi
Python binding for curl-impersonate fork via cffi. A http client that can impersonate browser tls/ja3/http2 fingerprints.
https://github.com/lexiforest/curl_cffi/releases/expanded_as...
-
-
Scrapling – A simple web scraping tool for Python
-
web-scraping
Detailed web scraping tutorials for dummies with financial data crawlers on Reddit WallStreetBets, CME (both options and futures), US Treasury, CFTC, LME, MacroTrends, SHFE and alternative data crawlers on Tomtom, BBC, Wall Street Journal, Al Jazeera, Reuters, Financial Times, Bloomberg, CNN, Fortune, The Economist
-
-
-
agentql
AgentQL is an AI-powered query language for web scraping and automation. It uses natural language selectors to find data on any page, including authenticated content. AgentQL queries are self-healing as UI changes and work across similar sites. Users can define structured data output, making AgentQL versatile for developers and data scientists.
We upgraded Stealth Mode to minimize bot detection when scraping or automating actions on third-party websites. Check out the launch post, this handy guide to Avoiding Bot Detection with Stealth Mode, or implement it today with our Stealth Mode Example Script (now in JavaScript, too!)
-
wayback-machine-scraper
A command-line utility and Scrapy middleware for scraping time series data from Archive.org's Wayback Machine.
-
dude
dude uncomplicated data extraction: A simple framework for writing web scrapers using Python decorators
-
-
letterboxd_recommendations
Scraping publicly-accessible Letterboxd data and creating a movie recommendation model with it that can generate recommendations when provided with a Letterboxd username
-
facebook_page_scraper
Scrapes facebook's pages front end with no limitations & provides a feature to turn data into structured JSON or CSV
-
scrapper
Web scraper with a simple REST API living in Docker and using a Headless browser and Readability.js for parsing.
-
-
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Python web-scraping discussion
Python web-scraping related posts
-
How to scrape Crunchbase using Python in 2024 (Easy Guide)
-
This Week In Python
-
This Week In Python
-
Show HN: Crawlee for Python – a web scraping and browser automation library
-
Announcing Crawlee Python: Now you can use Python to build reliable web crawlers
-
Tool to analyse leetcode compensations (India: Jan-Jul'24)
-
Claude is now available in Europe
-
A note from our sponsor - SaaSHub
www.saashub.com | 9 Feb 2025
Index
What are some of the best open-source web-scraping projects in Python? This list will help you:
# | Project | Stars |
---|---|---|
1 | Scrapy | 54,040 |
2 | changedetection.io | 21,461 |
3 | Douyin_TikTok_Download_API | 10,728 |
4 | SeleniumBase | 9,255 |
5 | autoscraper | 6,611 |
6 | crawlee-python | 5,221 |
7 | trafilatura | 3,901 |
8 | snoop | 3,164 |
9 | curl_cffi | 2,962 |
10 | Grab | 2,401 |
11 | Scrapling | 1,877 |
12 | web-scraping | 762 |
13 | scrapy-fake-useragent | 689 |
14 | google-search-results-python | 627 |
15 | agentql | 465 |
16 | wayback-machine-scraper | 430 |
17 | dude | 426 |
18 | twitter-scraper-selenium | 334 |
19 | letterboxd_recommendations | 281 |
20 | facebook_page_scraper | 248 |
21 | scrapper | 201 |
22 | estela | 175 |
23 | saveddit | 174 |