Python Scraper

Open-source Python projects categorized as Scraper Edit details

Top 23 Python Scraper Projects

  • newspaper

    News, full-text, and article metadata extraction in Python 3. Advanced docs:

    Project mention: Are there js libs for extracting content from a DOM document? | reddit.com/r/webscraping | 2022-06-07

    I think then you're looking for something similar to newspaper3k. Unfortunately it's written in python. https://github.com/codelucas/newspaper

  • instagram-scraper

    Scrapes an instagram user's photos and videos

    Project mention: How To download Saves from instagram? | reddit.com/r/DataHoarder | 2022-04-08

    There was a FOSS program that worked. The problem is, if they detect anything remotely like scraping, they'll ban your account. I don't know what a good "pause" amount of time to set in the program would be to get around their flags. I think this was the one I used before (and got my account banned.) https://github.com/arc298/instagram-scraper And then if you don't login, they make you wait a really long time between downloads and you can't access some information (like your saves.)

  • Scout APM

    Less time debugging, more time building. Scout APM allows you to find and fix performance issues with no hassle. Now with error monitoring and external services monitoring, Scout is a developer's best friend when it comes to application development.

  • autoscraper

    A Smart, Automatic, Fast and Lightweight Web Scraper for Python

    Project mention: Scrapping - How to deal with page changes Ai | reddit.com/r/webscraping | 2022-03-25

    It depends on the website, but autoscraper was used to calculate similar nodes given the text to search. Not sure how it works now but it's open source.

  • OnlyFans

    Scrape all the media from an OnlyFans account - Updated regularly

    Project mention: I can't get onlyfans-dl to work | reddit.com/r/Piracy | 2021-12-01

    I use this works well but yeah if you want me to scrape it I got you

  • Automatic-Udemy-Course-Enroller-GET-PAID-UDEMY-COURSES-for-FREE

    Do you want to LEARN NEW STUFF for FREE? Don't worry, with the power of web-scraping and automation, this script will find the necessary Udemy coupons & enroll you for PAID UDEMY COURSES, ABSOLUTELY FREE!

    Project mention: Anyone wanna share free company Udemy/Udacity accounts? | reddit.com/r/cscareerquestions | 2021-12-31

    I would check this out if you're looking to get ahold of Udemy courses that get discounted to free. https://github.com/aapatre/Automatic-Udemy-Course-Enroller-GET-PAID-UDEMY-COURSES-for-FREE

  • instagram-scraper

    scrapes medias, likes, followers, tags and all metadata. Inspired by instagram-php-scraper,bot (by realsirjoe)

    Project mention: Help in scraping Instagram | reddit.com/r/webscraping | 2021-10-09

    Yes: https://github.com/realsirjoe/instagram-scraper

  • JobFunnel

    Scrape job websites into a single spreadsheet with no duplicates.

  • JetBrains

    Developer Ecosystem Survey 2022. Take part in the Developer Ecosystem Survey 2022 by JetBrains and get a chance to win a Macbook, a Nvidia graphics card, or other prizes. We’ll create an infographic full of stats, and you’ll get personalized results so you can compare yourself with other developers.

  • bulk-downloader-for-reddit

    Downloads and archives content from reddit

    Project mention: Setting to automatically save a post's content on upvote. Bonus points if it can be whitelisted for specific subreddits | reddit.com/r/BoostForReddit | 2022-06-17
  • snscrape

    A social networking service scraper in Python

    Project mention: Fine-tuning a model to create a chat bot in a fictional setting? | reddit.com/r/GPT3 | 2022-06-07

    I fine tuned babbage with a little over 568 Tweets by Elon Musk from May 2022 and I've been playing around with it. I grabbed the Tweets using snscrape as a Python library as described in How to Scrape Tweets With snscrape.

  • cinemagoer

    Cinemagoer is a Python package useful to retrieve and manage the data of the IMDb (to which we are not affiliated in any way) movie database about movies, people, characters and companies

    Project mention: [OC]IMDB Top 30 movies: cast death rate | reddit.com/r/dataisbeautiful | 2022-01-17
  • scanless

    online port scan scraper

    Project mention: Awesome Penetration Testing | dev.to | 2021-10-06

    scanless - Utility for using websites to perform port scans on your behalf so as not to reveal your own IP.

  • RedditDownloader

    Scrapes Reddit to download media of your choice.

    Project mention: Does pushshift or any other archiver save any pictures or thumbnails from reddit posts? | reddit.com/r/pushshift | 2021-10-15

    Like it depends on what they are using to scrape the subreddit the two I ran across yesterday ripme and Reddit Media Downloader both only run when called so it would be highly unlikely those would pick it up because they'd have to be called within the few seconds it was available.

  • finviz

    Unofficial API for finviz.com

    Project mention: Question about (somewhat) live market volume data | reddit.com/r/algotrading | 2022-03-12

    Try https://www.tiingo.com/ pretty cheap data. You might be able to also scrape https://finviz.com/screener.ashx?v=161&ft=2&o=pe and store the data and compare. Useful python package I have previously used. https://github.com/mariostoev/finviz

  • linkedin_scraper

    A library that scrapes Linkedin for user data

    Project mention: [Hiring] Small modification of python script - $20 | reddit.com/r/forhire | 2021-07-25

    I am using https://github.com/joeyism/linkedin_scraper to scrape Linkedin, I modified it for my use-case as the following, it works fine for the first 8 posts but sadly I don't know how to scroll down before scraping to make Linkedin load older posts.

  • scrapyrt

    HTTP API for Scrapy spiders

    Project mention: New to python and scrapy stuff but need this project to work so that I can do my data research and stuff easily in the future. | reddit.com/r/scrapy | 2022-04-19
  • bookcorpus

    Crawl BookCorpus

  • trafilatura

    Web scraping library and command-line tool for text discovery and extraction (main content, metadata, comments)

    Project mention: Advice on standard design pattern for comparison test script | reddit.com/r/learnpython | 2022-05-24
  • Scweet

    A simple and unlimited twitter scraper : scrape tweets, likes, retweets, following, followers, user info, images...

    Project mention: Scraping the entirety of a private Twitter account. | reddit.com/r/DataHoarder | 2021-11-05

    Learn some Python and check this out: https://github.com/Altimis/Scweet

  • OnlyFans

    Scrape all the media from an OnlyFans account - Updated regularly (by DIGITALCRIMINALS)

    Project mention: I have no knowledge in CLI, can someone please help me install this Github program? | reddit.com/r/Piracy | 2022-05-08

    Here's the program: https://github.com/DIGITALCRIMINALS/OnlyFans

  • freshonions-torscraper

    Fresh Onions is an open source TOR spider / hidden service onion crawler hosted at zlal32teyptf4tvi.onion

    Project mention: How do I explore sites that aren't in lists? | reddit.com/r/TOR | 2021-08-11
  • cryptoCMD

    Cryptocurrency historical price data library in Python. Data from https://coinmarketcap.com.

  • google-play-scraper

    Google play scraper for Python inspired by <facundoolano/google-play-scraper> (by JoMingyu)

    Project mention: Scrape Google Play Store App in Python | reddit.com/r/Python | 2022-01-31

    Note: You don't really need to read this post unless you need a step-by-step explanation without using browser automation such as playwright and selenium since you can see what Python google-play-scraper regex solution is, how it scrapes app results, and how it scrapes review results.

  • GramAddict bot

    Completely free and open-source human-like Instagram bot. Powered by UIAutomator2 and compatible with basically any Android device 5.0+ that can run Instagram - real or emulated. (by GramAddict)

    Project mention: Can we manifest Instagram popularity/fame? | reddit.com/r/lawofattraction | 2022-02-25

    I recommend a free Instagram bot. https://github.com/GramAddict/bot I am using that bot

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2022-06-17.

Python Scraper related posts

Index

What are some of the best open-source Scraper projects in Python? This list will help you:

Project Stars
1 newspaper 11,955
2 instagram-scraper 6,874
3 autoscraper 4,367
4 OnlyFans 2,975
5 Automatic-Udemy-Course-Enroller-GET-PAID-UDEMY-COURSES-for-FREE 2,661
6 instagram-scraper 2,385
7 JobFunnel 1,567
8 bulk-downloader-for-reddit 1,380
9 snscrape 1,344
10 cinemagoer 977
11 scanless 945
12 RedditDownloader 805
13 finviz 781
14 linkedin_scraper 766
15 scrapyrt 741
16 bookcorpus 550
17 trafilatura 503
18 Scweet 437
19 OnlyFans 412
20 freshonions-torscraper 412
21 cryptoCMD 394
22 google-play-scraper 382
23 GramAddict bot 378
Find remote jobs at our new job board 99remotejobs.com. There are 4 new remote jobs listed recently.
Are you hiring? Post a new remote job listing for free.
Static code analysis for 29 languages.
Your projects are multi-language. So is SonarQube analysis. Find Bugs, Vulnerabilities, Security Hotspots, and Code Smells so you can release quality code every time. Get started analyzing your projects today for free.
www.sonarqube.org