SaaSHub helps you find the best software and product alternatives Learn more →
Top 23 Python Scraper Projects
-
Jobs_Applier_AI_Agent_AIHawk
AIHawk aims to easy job hunt process by automating the job application process. Utilizing artificial intelligence, it enables users to apply for multiple jobs in a tailored way.
-
InfluxDB
InfluxDB – Built for High-Performance Time Series Workloads. InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.
-
Douyin_TikTok_Download_API
🚀「Douyin_TikTok_Download_API」是一个开箱即用的高性能异步抖音、快手、TikTok、Bilibili数据爬取工具,支持API调用,在线批量解析及下载。
-
-
-
crawlee-python
Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with BeautifulSoup, Playwright, and raw HTTP. Both headful and headless mode. With proxy rotation.
Which hashtags are trending now? What is an influencer's engagement rate? What topics are important for a content creator? You can find answers to these and many other questions by analyzing TikTok data. However, for analysis, you need to extract the data in a convenient format. In this blog, we'll explore how to scrape TikTok using Crawlee for Python.
-
-
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
Automatic-Udemy-Course-Enroller-GET-PAID-UDEMY-COURSES-for-FREE
Do you want to LEARN NEW STUFF for FREE? Don't worry, with the power of web-scraping and automation, this script will find the necessary Udemy coupons & enroll you for PAID UDEMY COURSES, ABSOLUTELY FREE!
-
twikit
Twitter API Scraper | Without an API key | Twitter Internal API | Free | Twitter scraper | Twitter Bot
Project mention: Show HN: I made a free tool that analyzes SEC filings and posts detailed reports | news.ycombinator.com | 2025-04-14Unpopular opinion here... If you tread carefully you'll most likely not succeed. I am not American and I know you guys like to sue eachother for putting cats in microwaves and stuff so maybe this is not great opinion to have in America at the current moment.
I would go for it and put a disclaimer, or I would just incorporate in a country where there's no issues with these things.
all this is hard of course to provide good value, but worthwhile.
Twitters' cost is insane right now, I had quite a few ideas for twitter integrations but they would easily cost thousands per month just to access their API.
I looked into https://github.com/d60/twikit - might not be suitable but you can definitely play around with it. Just don't use your official account as I got shadow banned using it unfortunately.
-
-
-
Project mention: Show HN: Scraper for job listings directly from company websites | news.ycombinator.com | 2024-12-07
jobfunnel is FOSS and accepting contributions: https://github.com/PaulMcInnis/JobFunnel
Currently supports indeed, in the past supported glassdoor and others.
-
Project mention: CyberScraper 2077 – A OpenAI / Gemini Based Web Scraper | news.ycombinator.com | 2024-09-08
Info about CyberScraper-2077:
Rip data from the net, leaving no trace. Welcome to the future of web scraping.
About
CyberScraper 2077 is not just another web scraping tool – it's a glimpse into the future of data extraction. Born from the neon-lit streets of a cyberpunk world, this AI-powered scraper uses OpenAI to slice through the web's defenses, extracting the data you need with unparalleled precision and style.
Whether you're a corpo data analyst, a street-smart netrunner, or just someone looking to pull information from the digital realm, CyberScraper 2077 has got you covered.
Features
AI-Powered Extraction: Utilizes cutting-edge AI models to understand and parse web content intelligently.
Sleek Streamlit Interface: User-friendly GUI that even a chrome-armed street samurai could navigate.
Multi-Format Support: Export your data in JSON, CSV, HTML, SQL or Excel – whatever fits your cyberdeck.
Stealth Mode: Implemented stealth mode parameters that helps it from getting detected as bot.
Ollama Support: Use a huge libarary of open source LLMs.
Async Operations: Lightning-fast scraping that would make a Trauma Team jealous.
Smart Parsing: Structures scraped content as if it was extracted straight from the engram of a master netrunner.
Ethical Scraping: Respects robots.txt and site policies. We may be in 2077, but we still have standards.
Caching: We implemented content-based and query-based caching using LRU cache and a custom dictionary to reduce redundant API calls.
Upload to Google Sheets: Now you can easily upload your extract csv data to google sheets with one click.
Proxy Mode (Coming Soon): Built-in proxy support to keep you ghosting through the net.
Navigate through the Pages: Navigate through the webpage and scrap the data from different pages.
If you are unable to scrape a website and you are getting blocked, try out the current browser features:
Github: https://github.com/itsOwen/CyberScraper-2077
-
twscrape
2025! X / Twitter API scrapper with authorization support. Allows you to scrape search results, User's profiles (followers/following), Tweets (favoriters/retweeters) and more.
twscrape (python, lib) – a library for parsing data from X/Twitter. Mainly the project is in maintenance mode, I check its functionality every few months, and there have been no requests for new features.
-
animdl
A highly efficient, fast, powerful and light-weight anime downloader and streamer for your favorite anime.
-
-
GramAddict bot
Completely free and open-source human-like Instagram bot. Powered by UIAutomator2 and compatible with basically any Android device 5.0+ that can run Instagram - real or emulated. (by GramAddict)
-
cinemagoer
Cinemagoer is a Python package useful to retrieve and manage the data of the IMDb (to which we are not affiliated in any way) movie database about movies, people, characters and companies
-
Scweet
A simple and unlimited twitter scraper : scrape tweets, likes, retweets, following, followers, user info, images...
-
-
-
TikTokLive
The definitive Python library to receive livestream events (comments, gifts, etc.) in realtime from TikTok LIVE.
-
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Python Scraper discussion
Python Scraper related posts
-
CyberScraper 2077 – A OpenAI / Gemini Based Web Scraper
-
CyberScraper-2077 – A LLM Based Web Scraper
-
Download Any Substack Article as Markdown (Open Source)
-
Can someone walk me through this?
-
What’s the coolest things you’ve done with python?
-
BDFR skipping Reddit hosted videos
-
Updated Drexel Scheduler to Winter Quarter
-
A note from our sponsor - SaaSHub
www.saashub.com | 14 May 2025
Index
What are some of the best open-source Scraper projects in Python? This list will help you:
# | Project | Stars |
---|---|---|
1 | Jobs_Applier_AI_Agent_AIHawk | 28,094 |
2 | Douyin_TikTok_Download_API | 12,303 |
3 | chinese-xinhua | 11,112 |
4 | autoscraper | 6,758 |
5 | crawlee-python | 5,638 |
6 | snscrape | 4,838 |
7 | myGPTReader | 4,441 |
8 | Automatic-Udemy-Course-Enroller-GET-PAID-UDEMY-COURSES-for-FREE | 3,205 |
9 | twikit | 2,586 |
10 | linkedin_scraper | 2,477 |
11 | bulk-downloader-for-reddit | 2,396 |
12 | JobFunnel | 2,010 |
13 | CyberScraper-2077 | 1,684 |
14 | twscrape | 1,600 |
15 | animdl | 1,374 |
16 | mlscraper | 1,342 |
17 | GramAddict bot | 1,330 |
18 | cinemagoer | 1,267 |
19 | Scweet | 1,153 |
20 | RedditDownloader | 1,126 |
21 | finviz | 1,116 |
22 | TikTokLive | 1,109 |
23 | URS | 875 |