Top 23 Python Webscraping Projects
A Smart, Automatic, Fast and Lightweight Web Scraper for PythonProject mention: Turn Any Website Into An API with AutoScraper and FastAPI | dev.to | 2021-04-24
In this article, we will learn how to create a simple e-commerce search API with multiple platform support: eBay and Amazon. AutoScraper and FastAPi provide the ability to create a powerful JSON API for the date. With Playwright's help, we'll extend our scraper and avoid blocking by using ScrapingAnt's web scraping API.
🥫 The simple, fast, and modern web scraping libraryProject mention: Ask HN: What are some tools / libraries you built yourself? | news.ycombinator.com | 2021-05-16
I've been working on gazpacho  for last two years.
It's a general purpose web scraping library for Python that replaces BeautifulSoup + requests for most projects.
Just surpassed ~2K downloads every week!
Optimize your datasets for ML. Goodbye, boilerplate code - the fastest dataset optimization and management tool for computer vision.
Powerful and flexible Instagram scraping library for Python, providing easy-to-use and expressive tools for accessing data programmaticallyProject mention: Fetching all comments from an Instagram post? | reddit.com/r/webscraping | 2021-06-16
There are Instagram scrapers that you can use for this if you don't want to use the Instagram api. It can fetch a lot of posts, but you should rate limit it.
A TikTokBot that downloads trending tiktok videos and compiles them using FFmpegProject mention: [OFFER] CHEAP and High-Quality Programming | reddit.com/r/slavelabour | 2021-03-27
TikTokBot automatically make TikTok compilations
Automated Python Script to retrieve vaccine slots availability and get notified when a slot is available.Project mention: Automate Cowin Vaccine slots Availablity using Python | dev.to | 2021-05-17
And with that, it's a wrap! You can host the script at the server to get notified every time a slot is available near you. You can find all the code at my GitHub Repository. Drop a star if you find it useful.
Make a ZIM file from any Web site and surf offline!Project mention: Reading from the web offline and distraction-free | news.ycombinator.com | 2021-10-10
which worked quite well for most sites, but still very far from a general-purpose solution.
There is also more powerful/general-purpose scraper that generates a ZIM file here: https://github.com/openzim/zimit
It would be really nice to a "common" scraper code base that takes care of scraping (possibly with a real headless browser) and outputs all assets as files + info as JSON. This common code base could then be used by all kinds of programs to package the content as standalone HTML zip files, ePub, ZIM, or even PDF for crazy people like me who like to print things ;)
Script for checking changes in webpagesProject mention: Help me automate a boring task. [Print TO HTML] | reddit.com/r/learnpython | 2021-10-10
Sure, in this project https://github.com/Jaime-alv/web_check. Look at checker.py inside web_check folder, line 37 onwards.
Run Linux Software Faster and Safer than Linux with Unikernels.
API for fetching data from news websites.Project mention: Newsemble: An API to fetch current news data | reddit.com/r/Python | 2021-07-18
I read through the documentation and tinkered around with it -- great work! One recommendation I would make, particularly if you're hoping that this will be useful long-term for NLP, is not to delete the previously scraped data. For instance, http://www.newsemble.ml/news only contains 129 results, which is nowhere near comprehensive enough to ensure any kind of statistically significant NLP.
Fast and robust date extraction from web pages, from the command-line or within Python
Archive a reddit user's post history. Formatted overview of a profile, JSON containing every post, and picture downloads. Uses the pushshift API.Project mention: How to get list of subreddits where user has commented? | reddit.com/r/redditdev | 2021-08-06
A Dragnet that also extract author, headline, date, keywords from contextProject mention: ExtractNet - a dragnet that also extract author, headline, date, keywords from web page in ML fashion | reddit.com/r/Python | 2021-02-10
ScrapingAnt API client for Python.Project mention: We've updated our scraping API docs. Do you like or hate it? Thanks in advance for the comments! | reddit.com/r/webscraping | 2021-02-21
Hello. In order to our previous conversation: We've created a python client library for our API: https://github.com/ScrapingAnt/scrapingant-client-python So you can check it out. Scrapy plugin is still under development. I'll keep you posted. Thanks for the interest :-)
A Udemy Course Scraper built with bs4 and selenium, that fetches udemy course information. Get udemy course information and convert it to json, csv or xml file, without authentication!Project mention: What cool projects have you make with BeautifulSoup to make your life easier? | reddit.com/r/Python | 2021-09-15
A fast and expressive Craigslist API wrapperProject mention: Web Scraping Used Car data via Carfax.com | reddit.com/r/webscraping | 2021-08-30
I understand your concerns, as the quality of posts can vary greatly on Craigslist. But the plus side is there is a vast amount or used cars on Craigslist. If you know Python, try pycraigslist.
A more simplified, straightforward, and plain version of Hacker News.Project mention: HackerNews Simplified with Python3 and BeautifulSoup | reddit.com/r/learnprogramming | 2021-10-16
HackerNews Simplified Is A more simplified and straightforward version of HackerNews. The project uses BeautifulSoup, a python web scraping framework to scrape the data from the hackernews website and take ony the most relevant news and displays it on your terminal.
This is a python web-scraping project to get all the product names, price, review stars and review count of a particular category of the productProject mention: Program to get all the information about a product on Amazon | reddit.com/r/indiasocial | 2021-05-26
All you have to do is go to this link and click on Code on the top right corner and click on Download Zip. Once you have extracted, simply click on AmazonProductScraper.exe and type the name of the product to be searched. Once the browser closes automatically, all the information is present in the same folder as an excel sheet.
This program uses web scraping to download images from google image search instantly and can be helpful in making image datasets.Project mention: help with google image scraper script | reddit.com/r/DataHoarder | 2021-06-12
I'm running a google image scraper python script found here: https://github.com/NIKHILDUGAR/Google-Image-Scraper everything looks like it's running smoothly, but then it runs into the following error before it ever downloads any images:
Various data fetching, cleaning and processing scripts to collect and process data on public toilets in LondonProject mention: Why doesn’t the U.K. have more public toilets? | reddit.com/r/AskUK | 2021-05-02
If you're in London, there this app - https://www.toilets4london.com/
ISIN code to PriceProject mention: This Script Searchs the ISIN code and finds the fund price. | reddit.com/r/Python | 2021-05-10
Check it here : GitHub Link
Functions for scraping AZLyrics.com and downloading songs lyrics in txt formatProject mention: Most frequent words in Springsteen's Lyrics | reddit.com/r/BruceSpringsteen | 2021-01-23
Actually I did the web-scraping with Python and analyzed the lyrics with R because I’m a bit more confident with it. If you’re interested I have uploaded the code I wrote for the web-scraping here. Feel absolutely free to use it!
Webscraping done thoroughlyProject mention: downdrag, python web scraping utility with extensible features | reddit.com/r/webscraping | 2021-05-05
Telegram bot: Check anime/comic/game/novel websites updateProject mention: I need an app to get notifications from webnovels from websites such as syosetu. | reddit.com/r/noveltranslations | 2021-08-10
You could spin up this telegram bot for it https://github.com/nonjosh/acgn-bot. I don’t know if there’s a public one already setup
Using Selenium, generates a set of words to study by translating to and from English and Spanish.Project mention: I made a tool to revise Spanish using randomly selected words | reddit.com/r/Python | 2021-02-03
What are some of the best open-source Webscraping projects in Python? This list will help you:
Are you hiring? Post a new remote job listing for free.