Top 23 Python Scraper Projects
News, full-text, and article metadata extraction in Python 3. Advanced docs:Project mention: Are there js libs for extracting content from a DOM document? | reddit.com/r/webscraping | 2022-06-07
I think then you're looking for something similar to newspaper3k. Unfortunately it's written in python. https://github.com/codelucas/newspaper
Scrapes an instagram user's photos and videosProject mention: How To download Saves from instagram? | reddit.com/r/DataHoarder | 2022-04-08
There was a FOSS program that worked. The problem is, if they detect anything remotely like scraping, they'll ban your account. I don't know what a good "pause" amount of time to set in the program would be to get around their flags. I think this was the one I used before (and got my account banned.) https://github.com/arc298/instagram-scraper And then if you don't login, they make you wait a really long time between downloads and you can't access some information (like your saves.)
Less time debugging, more time building. Scout APM allows you to find and fix performance issues with no hassle. Now with error monitoring and external services monitoring, Scout is a developer's best friend when it comes to application development.
A Smart, Automatic, Fast and Lightweight Web Scraper for PythonProject mention: Scrapping - How to deal with page changes Ai | reddit.com/r/webscraping | 2022-03-25
It depends on the website, but autoscraper was used to calculate similar nodes given the text to search. Not sure how it works now but it's open source.
Scrape all the media from an OnlyFans account - Updated regularlyProject mention: I can't get onlyfans-dl to work | reddit.com/r/Piracy | 2021-12-01
I use this works well but yeah if you want me to scrape it I got you
Do you want to LEARN NEW STUFF for FREE? Don't worry, with the power of web-scraping and automation, this script will find the necessary Udemy coupons & enroll you for PAID UDEMY COURSES, ABSOLUTELY FREE!Project mention: Anyone wanna share free company Udemy/Udacity accounts? | reddit.com/r/cscareerquestions | 2021-12-31
I would check this out if you're looking to get ahold of Udemy courses that get discounted to free. https://github.com/aapatre/Automatic-Udemy-Course-Enroller-GET-PAID-UDEMY-COURSES-for-FREE
scrapes medias, likes, followers, tags and all metadata. Inspired by instagram-php-scraper,bot (by realsirjoe)Project mention: Help in scraping Instagram | reddit.com/r/webscraping | 2021-10-09
Scrape job websites into a single spreadsheet with no duplicates.
Developer Ecosystem Survey 2022. Take part in the Developer Ecosystem Survey 2022 by JetBrains and get a chance to win a Macbook, a Nvidia graphics card, or other prizes. We’ll create an infographic full of stats, and you’ll get personalized results so you can compare yourself with other developers.
Downloads and archives content from redditProject mention: Setting to automatically save a post's content on upvote. Bonus points if it can be whitelisted for specific subreddits | reddit.com/r/BoostForReddit | 2022-06-17
A social networking service scraper in PythonProject mention: Fine-tuning a model to create a chat bot in a fictional setting? | reddit.com/r/GPT3 | 2022-06-07
I fine tuned babbage with a little over 568 Tweets by Elon Musk from May 2022 and I've been playing around with it. I grabbed the Tweets using snscrape as a Python library as described in How to Scrape Tweets With snscrape.
Cinemagoer is a Python package useful to retrieve and manage the data of the IMDb (to which we are not affiliated in any way) movie database about movies, people, characters and companiesProject mention: [OC]IMDB Top 30 movies: cast death rate | reddit.com/r/dataisbeautiful | 2022-01-17
online port scan scraperProject mention: Awesome Penetration Testing | dev.to | 2021-10-06
scanless - Utility for using websites to perform port scans on your behalf so as not to reveal your own IP.
Scrapes Reddit to download media of your choice.Project mention: Does pushshift or any other archiver save any pictures or thumbnails from reddit posts? | reddit.com/r/pushshift | 2021-10-15
Like it depends on what they are using to scrape the subreddit the two I ran across yesterday ripme and Reddit Media Downloader both only run when called so it would be highly unlikely those would pick it up because they'd have to be called within the few seconds it was available.
Unofficial API for finviz.comProject mention: Question about (somewhat) live market volume data | reddit.com/r/algotrading | 2022-03-12
Try https://www.tiingo.com/ pretty cheap data. You might be able to also scrape https://finviz.com/screener.ashx?v=161&ft=2&o=pe and store the data and compare. Useful python package I have previously used. https://github.com/mariostoev/finviz
HTTP API for Scrapy spidersProject mention: New to python and scrapy stuff but need this project to work so that I can do my data research and stuff easily in the future. | reddit.com/r/scrapy | 2022-04-19
Web scraping library and command-line tool for text discovery and extraction (main content, metadata, comments)Project mention: Advice on standard design pattern for comparison test script | reddit.com/r/learnpython | 2022-05-24
A simple and unlimited twitter scraper : scrape tweets, likes, retweets, following, followers, user info, images...Project mention: Scraping the entirety of a private Twitter account. | reddit.com/r/DataHoarder | 2021-11-05
Learn some Python and check this out: https://github.com/Altimis/Scweet
Scrape all the media from an OnlyFans account - Updated regularly (by DIGITALCRIMINALS)Project mention: I have no knowledge in CLI, can someone please help me install this Github program? | reddit.com/r/Piracy | 2022-05-08
Here's the program: https://github.com/DIGITALCRIMINALS/OnlyFans
Fresh Onions is an open source TOR spider / hidden service onion crawler hosted at zlal32teyptf4tvi.onionProject mention: How do I explore sites that aren't in lists? | reddit.com/r/TOR | 2021-08-11
Cryptocurrency historical price data library in Python. Data from https://coinmarketcap.com.
Google play scraper for Python inspired by <facundoolano/google-play-scraper> (by JoMingyu)Project mention: Scrape Google Play Store App in Python | reddit.com/r/Python | 2022-01-31
Note: You don't really need to read this post unless you need a step-by-step explanation without using browser automation such as playwright and selenium since you can see what Python google-play-scraper regex solution is, how it scrapes app results, and how it scrapes review results.
Completely free and open-source human-like Instagram bot. Powered by UIAutomator2 and compatible with basically any Android device 5.0+ that can run Instagram - real or emulated. (by GramAddict)Project mention: Can we manifest Instagram popularity/fame? | reddit.com/r/lawofattraction | 2022-02-25
I recommend a free Instagram bot. https://github.com/GramAddict/bot I am using that bot
Python Scraper related posts
Bitte um Hilfe: Text Korpus des deutschsprachigen Reddits für Masterarbeit in Linguistik.
2 projects | reddit.com/r/de | 23 Jun 2022
Setting to automatically save a post's content on upvote. Bonus points if it can be whitelisted for specific subreddits
1 project | reddit.com/r/BoostForReddit | 17 Jun 2022
Fine-tuning a model to create a chat bot in a fictional setting?
1 project | reddit.com/r/GPT3 | 7 Jun 2022
[OC] Who were the most discussed riders in r/peloton's Giro d'Italia results threads?
2 projects | reddit.com/r/peloton | 29 May 2022
Tesla Service Manuals (Google Drive)
1 project | reddit.com/r/opendirectories | 25 May 2022
Tesla Service Manuals are now FREE. Get yours while you can!
1 project | reddit.com/r/teslamotors | 25 May 2022
Advice on standard design pattern for comparison test script
1 project | reddit.com/r/learnpython | 24 May 2022
What are some of the best open-source Scraper projects in Python? This list will help you:
Are you hiring? Post a new remote job listing for free.