Python Scraper

Open-source Python projects categorized as Scraper

Top 23 Python Scraper Projects

  1. Jobs_Applier_AI_Agent_AIHawk

    AIHawk aims to easy job hunt process by automating the job application process. Utilizing artificial intelligence, it enables users to apply for multiple jobs in a tailored way.

    Project mention: Jobs Applier AI Agent | news.ycombinator.com | 2024-12-08
  2. InfluxDB

    InfluxDB – Built for High-Performance Time Series Workloads. InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.

    InfluxDB logo
  3. Douyin_TikTok_Download_API

    🚀「Douyin_TikTok_Download_API」是一个开箱即用的高性能异步抖音、快手、TikTok、Bilibili数据爬取工具,支持API调用,在线批量解析及下载。

  4. chinese-xinhua

    :orange_book: 中华新华字典数据库。包括歇后语,成语,词语,汉字。

  5. autoscraper

    A Smart, Automatic, Fast and Lightweight Web Scraper for Python

  6. crawlee-python

    Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with BeautifulSoup, Playwright, and raw HTTP. Both headful and headless mode. With proxy rotation.

    Project mention: How to scrape TikTok using Python | dev.to | 2025-04-30

    Which hashtags are trending now? What is an influencer's engagement rate? What topics are important for a content creator? You can find answers to these and many other questions by analyzing TikTok data. However, for analysis, you need to extract the data in a convenient format. In this blog, we'll explore how to scrape TikTok using Crawlee for Python.

  7. snscrape

    A social networking service scraper in Python

  8. myGPTReader

    A community-driven way to read and chat with AI bots - powered by chatGPT.

  9. SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  10. Automatic-Udemy-Course-Enroller-GET-PAID-UDEMY-COURSES-for-FREE

    Do you want to LEARN NEW STUFF for FREE? Don't worry, with the power of web-scraping and automation, this script will find the necessary Udemy coupons & enroll you for PAID UDEMY COURSES, ABSOLUTELY FREE!

  11. twikit

    Twitter API Scraper | Without an API key | Twitter Internal API | Free | Twitter scraper | Twitter Bot

    Project mention: Show HN: I made a free tool that analyzes SEC filings and posts detailed reports | news.ycombinator.com | 2025-04-14

    Unpopular opinion here... If you tread carefully you'll most likely not succeed. I am not American and I know you guys like to sue eachother for putting cats in microwaves and stuff so maybe this is not great opinion to have in America at the current moment.

    I would go for it and put a disclaimer, or I would just incorporate in a country where there's no issues with these things.

    all this is hard of course to provide good value, but worthwhile.

    Twitters' cost is insane right now, I had quite a few ideas for twitter integrations but they would easily cost thousands per month just to access their API.

    I looked into https://github.com/d60/twikit - might not be suitable but you can definitely play around with it. Just don't use your official account as I got shadow banned using it unfortunately.

  12. linkedin_scraper

    A library that scrapes Linkedin for user data

  13. bulk-downloader-for-reddit

    Downloads and archives content from reddit

  14. JobFunnel

    Scrape job websites into a single spreadsheet with no duplicates.

    Project mention: Show HN: Scraper for job listings directly from company websites | news.ycombinator.com | 2024-12-07

    jobfunnel is FOSS and accepting contributions: https://github.com/PaulMcInnis/JobFunnel

    Currently supports indeed, in the past supported glassdoor and others.

  15. CyberScraper-2077

    A Powerful web scraper powered by LLM | OpenAI, Gemini & Ollama

    Project mention: CyberScraper 2077 – A OpenAI / Gemini Based Web Scraper | news.ycombinator.com | 2024-09-08

    Info about CyberScraper-2077:

    Rip data from the net, leaving no trace. Welcome to the future of web scraping.

    About

    CyberScraper 2077 is not just another web scraping tool – it's a glimpse into the future of data extraction. Born from the neon-lit streets of a cyberpunk world, this AI-powered scraper uses OpenAI to slice through the web's defenses, extracting the data you need with unparalleled precision and style.

    Whether you're a corpo data analyst, a street-smart netrunner, or just someone looking to pull information from the digital realm, CyberScraper 2077 has got you covered.

    Features

    AI-Powered Extraction: Utilizes cutting-edge AI models to understand and parse web content intelligently.

    Sleek Streamlit Interface: User-friendly GUI that even a chrome-armed street samurai could navigate.

    Multi-Format Support: Export your data in JSON, CSV, HTML, SQL or Excel – whatever fits your cyberdeck.

    Stealth Mode: Implemented stealth mode parameters that helps it from getting detected as bot.

    Ollama Support: Use a huge libarary of open source LLMs.

    Async Operations: Lightning-fast scraping that would make a Trauma Team jealous.

    Smart Parsing: Structures scraped content as if it was extracted straight from the engram of a master netrunner.

    Ethical Scraping: Respects robots.txt and site policies. We may be in 2077, but we still have standards.

    Caching: We implemented content-based and query-based caching using LRU cache and a custom dictionary to reduce redundant API calls.

    Upload to Google Sheets: Now you can easily upload your extract csv data to google sheets with one click.

    Proxy Mode (Coming Soon): Built-in proxy support to keep you ghosting through the net.

    Navigate through the Pages: Navigate through the webpage and scrap the data from different pages.

    If you are unable to scrape a website and you are getting blocked, try out the current browser features:

    Github: https://github.com/itsOwen/CyberScraper-2077

  16. twscrape

    2025! X / Twitter API scrapper with authorization support. Allows you to scrape search results, User's profiles (followers/following), Tweets (favoriters/retweeters) and more.

    Project mention: 2024 In Review | dev.to | 2025-01-01

    twscrape (python, lib) – a library for parsing data from X/Twitter. Mainly the project is in maintenance mode, I check its functionality every few months, and there have been no requests for new features.

  17. animdl

    A highly efficient, fast, powerful and light-weight anime downloader and streamer for your favorite anime.

  18. mlscraper

    🤖 Scrape data from HTML websites automatically by just providing examples

  19. GramAddict bot

    Completely free and open-source human-like Instagram bot. Powered by UIAutomator2 and compatible with basically any Android device 5.0+ that can run Instagram - real or emulated. (by GramAddict)

  20. cinemagoer

    Cinemagoer is a Python package useful to retrieve and manage the data of the IMDb (to which we are not affiliated in any way) movie database about movies, people, characters and companies

  21. Scweet

    A simple and unlimited twitter scraper : scrape tweets, likes, retweets, following, followers, user info, images...

  22. RedditDownloader

    Scrapes Reddit to download media of your choice.

  23. finviz

    Unofficial API for finviz.com

  24. TikTokLive

    The definitive Python library to receive livestream events (comments, gifts, etc.) in realtime from TikTok LIVE.

  25. URS

    Universal Reddit Scraper - A comprehensive Reddit scraping/archival command-line tool.

  26. SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python Scraper discussion

Log in or Post with

Python Scraper related posts

Index

What are some of the best open-source Scraper projects in Python? This list will help you:

# Project Stars
1 Jobs_Applier_AI_Agent_AIHawk 28,094
2 Douyin_TikTok_Download_API 12,303
3 chinese-xinhua 11,112
4 autoscraper 6,758
5 crawlee-python 5,638
6 snscrape 4,838
7 myGPTReader 4,441
8 Automatic-Udemy-Course-Enroller-GET-PAID-UDEMY-COURSES-for-FREE 3,205
9 twikit 2,586
10 linkedin_scraper 2,477
11 bulk-downloader-for-reddit 2,396
12 JobFunnel 2,010
13 CyberScraper-2077 1,684
14 twscrape 1,600
15 animdl 1,374
16 mlscraper 1,342
17 GramAddict bot 1,330
18 cinemagoer 1,267
19 Scweet 1,153
20 RedditDownloader 1,126
21 finviz 1,116
22 TikTokLive 1,109
23 URS 875

Sponsored
InfluxDB – Built for High-Performance Time Series Workloads
InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.
www.influxdata.com

Did you know that Python is
the 2nd most popular programming language
based on number of references?