Python Scraper

Open-source Python projects categorized as Scraper

Top 23 Python Scraper Projects

  • newspaper

    News, full-text, and article metadata extraction in Python 3. Advanced docs:

    Project mention: Gathering News Headlines | reddit.com/r/Automate | 2022-08-08
  • chinese-xinhua

    :orange_book: 中华新华字典数据库。包括歇后语,成语,词语,汉字。

  • InfluxDB

    Access the most powerful time series database as a service. Ingest, store, & analyze all types of time series data in a fully-managed, purpose-built database. Keep data forever with low-cost storage and superior data compression.

  • changedetection.io

    The best and simplest free open source website change detection, restock monitor and notification service. Restock Monitor, change detection. Designed for simplicity - Simply monitor which websites had a text change for free. Free Open source web page change detection, Website defacement monitoring, Price change and Price Drop notification

    Project mention: changedetection.io releases version 0.42! | reddit.com/r/selfhosted | 2023-05-22
  • autoscraper

    A Smart, Automatic, Fast and Lightweight Web Scraper for Python

    Project mention: What are the best tools for web scraping and analysis of natural language to populate a dataset? | reddit.com/r/datasets | 2023-04-12

    See if something like autoscraper or mlscraper suits your needs.

  • myGPTReader

    A community-driven way to read and chat with AI bots - powered by chatGPT.

    Project mention: GitHub - madawei2699/myGPTReader: myGPTReader is a bot on Slack that can read and summarize any webpage, documents including ebooks, or even videos from YouTube. It can communicate with you through voice. (a Python project) | reddit.com/r/Python | 2023-03-30
  • snscrape

    A social networking service scraper in Python

    Project mention: Twitter scraping for complete profiles (very large data sets)? | reddit.com/r/Archiveteam | 2023-05-11

    Try Snscrape.

  • OnlyFans

    Scrape all the media from an OnlyFans account - Updated regularly

    Project mention: How to check whether a word is allowed to advertise in adwords? | reddit.com/r/adwords | 2023-03-09

    Problem is I don't even know what I was doing wrong - I made an ad with words "onlyfans downloader" that linked to "https://github.com/DIGITALCRIMINAL/OnlyFans". The ban message says I was banned for trying to trick the system. I've removed the onlyfans campaign and explained that I was just trying to check whether the keyword is allowed, they denied my appeal twice

  • Sonar

    Write Clean Python Code. Always.. Sonar helps you commit clean code every time. With over 225 unique rules to find Python bugs, code smells & vulnerabilities, Sonar finds the issues while you focus on the work.

  • Automatic-Udemy-Course-Enroller-GET-PAID-UDEMY-COURSES-for-FREE

    Do you want to LEARN NEW STUFF for FREE? Don't worry, with the power of web-scraping and automation, this script will find the necessary Udemy coupons & enroll you for PAID UDEMY COURSES, ABSOLUTELY FREE!

  • bulk-downloader-for-reddit

    Downloads and archives content from reddit

    Project mention: Could someone please review my reddit-img-dl command | reddit.com/r/DataHoarder | 2023-05-24

    wfdownloader is easier to use (just drag and drop the reddit link) but bdfr has more options for reddit.

  • JobFunnel

    Scrape job websites into a single spreadsheet with no duplicates.

  • linkedin_scraper

    A library that scrapes Linkedin for user data

    Project mention: Using chatgpt for cold email | reddit.com/r/ChatGPTCoding | 2022-12-31

    Scrape a persons LinkedIn page (something like this would do - https://github.com/joeyism/linkedin_scraper), use the profile as the context, write a good prompt

  • cinemagoer

    Cinemagoer is a Python package useful to retrieve and manage the data of the IMDb (to which we are not affiliated in any way) movie database about movies, people, characters and companies

  • scanless

    online port scan scraper

    Project mention: University final year project | reddit.com/r/cybersecurity_help | 2022-09-20

    Scanless is an online port scan scraper.

  • RedditDownloader

    Scrapes Reddit to download media of your choice.

    Project mention: [Data Hoarder] Quelles sont mes options pour télécharger les médias d'un utilisateur de Reddit et le garder à jour? | reddit.com/r/enfrancais | 2023-05-10
  • mlscraper

    🤖 Scrape data from HTML websites automatically by just providing examples

    Project mention: What are the best tools for web scraping and analysis of natural language to populate a dataset? | reddit.com/r/datasets | 2023-04-12

    See if something like autoscraper or mlscraper suits your needs.

  • animdl

    A highly efficient, fast, powerful and light-weight anime downloader and streamer for your favorite anime.

    Project mention: Farewell | reddit.com/r/linuxmasterrace | 2022-12-28

    Animdl (python-based) exists as a replacement. And there are multiple sources. pip install animdl

  • finviz

    Unofficial API for finviz.com

    Project mention: Scraping Realtime Data from finviz | reddit.com/r/algotrading | 2023-03-23

    https://github.com/mariostoev/finviz may be helpful to you

  • Scweet

    A simple and unlimited twitter scraper : scrape tweets, likes, retweets, following, followers, user info, images...

    Project mention: Twitter api reaching rate limit. 5calls per 15 mins just to get user likes. | reddit.com/r/learnprogramming | 2023-05-22

    hmm,, do you know any good one? I found this one but it doesn't scrape a single tweet's likes and followers https://github.com/Altimis/Scweet

  • scrapyrt

    HTTP API for Scrapy spiders

  • bookcorpus

    Crawl BookCorpus

    Project mention: Can chat GPT overtake Google if they play their cards right? | reddit.com/r/Futurology | 2022-12-23
  • GramAddict bot

    Completely free and open-source human-like Instagram bot. Powered by UIAutomator2 and compatible with basically any Android device 5.0+ that can run Instagram - real or emulated. (by GramAddict)

  • google-play-scraper

    Google play scraper for Python inspired by <facundoolano/google-play-scraper> (by JoMingyu)

    Project mention: Report: Analysis of 2.9 millions apps on Google Play | reddit.com/r/androiddev | 2022-11-08

    Its easy. python library: google-play-scraper.

  • fansly-downloader

    Executable Downloader App - a absolute must-have for Fansly enthusiasts. With this easy-to-use content downloading tool, you can download all your favorite content from fansly.com. No more manual downloads, enjoy your Fansly content offline anytime, anywhere! Fully customizable to download photos, videos, messages, collection & single posts 🔥

    Project mention: Veliko berem, da če nimaš službe, dobiš takoj zastonj občinsko stanovanje, kjer ni treba plačevati elektrike itd. Jaz bi tudi to naredila. Mi poveste, kako vsi to dobite, sklepam da je zelo lahko in vsak to dobi? | reddit.com/r/Slovenia | 2023-05-20

    ---> https://fansly.com

  • CodiumAI

    TestGPT | Generating meaningful tests for busy devs. Get non-trivial tests (and trivial, too!) suggested right inside your IDE, so you can code smart, create more value, and stay confident when you push.

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2023-05-24.

Python Scraper related posts

Index

What are some of the best open-source Scraper projects in Python? This list will help you:

Project Stars
1 newspaper 12,800
2 chinese-xinhua 10,079
3 changedetection.io 9,213
4 autoscraper 5,206
5 myGPTReader 4,004
6 snscrape 3,471
7 OnlyFans 3,410
8 Automatic-Udemy-Course-Enroller-GET-PAID-UDEMY-COURSES-for-FREE 2,894
9 bulk-downloader-for-reddit 1,811
10 JobFunnel 1,635
11 linkedin_scraper 1,131
12 cinemagoer 1,114
13 scanless 1,040
14 RedditDownloader 991
15 mlscraper 953
16 animdl 912
17 finviz 888
18 Scweet 768
19 scrapyrt 767
20 bookcorpus 635
21 GramAddict bot 577
22 google-play-scraper 560
23 fansly-downloader 519
ONLYOFFICE Docs — document collaboration in your environment
Powerful document editing and collaboration in your app or environment. Ultimate security, API and 30+ ready connectors, SaaS or on-premises
www.onlyoffice.com