Get non-trivial tests (and trivial, too!) suggested right inside your IDE, so you can code smart, create more value, and stay confident when you push. Learn more →
Top 23 Python Webscraping Projects
A Smart, Automatic, Fast and Lightweight Web Scraper for PythonProject mention: What are the best tools for web scraping and analysis of natural language to populate a dataset? | reddit.com/r/datasets | 2023-04-12
See if something like autoscraper or mlscraper suits your needs.
Webscraping Open Project
The web scraping open project repository aims to share knowledge and experiences about web scraping with PythonProject mention: What are your thoughts on scrapy | reddit.com/r/webscraping | 2022-08-30
Write Clean Python Code. Always.. Sonar helps you commit clean code every time. With over 225 unique rules to find Python bugs, code smells & vulnerabilities, Sonar finds the issues while you focus on the work.
Transparent persistent cache for python requestsProject mention: Web Scraping with Python: from Fundamentals to Practice | reddit.com/r/Python | 2022-06-23
For anyone who goes with requests as your HTTP client, I would highly recommend adding requests-cache for a nice performance boost.
👻 Experimental library for scraping websites using OpenAI's GPT API.Project mention: Those of you who have developed product features using GPT4 API (or failed to do so), how did it go? | reddit.com/r/ExperiencedDevs | 2023-04-15
Not my project but an ex-colleague has been having some success in this direction: https://jamesturk.github.io/scrapeghost/
LinkedIn enumeration tool to extract valid employee names from an organization through search engine scrapingProject mention: What do with email and names? | reddit.com/r/OSINT | 2023-03-22
I used https://github.com/m8sec/CrossLinked, it takes a domain as input and gives this as output: Datetime, Search, First, Last, Title, URL, rawText "03-22-2023 08:36:54","google","Elon","Musc","manager operational unit","https://be.linkedin.com/in/elon-musc-b8464215","Elon Musc - manager operational unit - LinkedIn linkedin.comhttps://be.linkedin.com > elon-m",
🥫 The simple, fast, and modern web scraping library
dude uncomplicated data extraction: A simple framework for writing web scrapers using Python decoratorsProject mention: Webscraping beginner here ready to start leveling up to intermediate. Looking for some good webscraping repositories (e.g any of your GitHub repos/projects) that I can use as learning tools, and general recommendations for what to do next | reddit.com/r/webscraping | 2023-05-08
Please check https://github.com/roniemartinez/dude
Access the most powerful time series database as a service. Ingest, store, & analyze all types of time series data in a fully-managed, purpose-built database. Keep data forever with low-cost storage and superior data compression.
A TikTokBot that downloads trending tiktok videos and compiles them using FFmpeg
Perform Google Dork search with Dorkify
[PH0MBER]: An open source infomation grathering & reconnaissance framework!Project mention: ChatGPT is falsely recommending us for a service we don't provide | news.ycombinator.com | 2023-02-23
Phomber is not the best example. Ed contacted the developer of that tool over a year ago about the issue and to remove mentions of OpenCage and as far as I see the author removed it https://github.com/s41r4j/phomber/issues/4
Scrape all eBay sold listings to determine average/median pricing, plot listings over time with trend lines, and extract to excel
Make a ZIM file from any Web site and surf offline!Project mention: Zim vs WARC ? | reddit.com/r/Kiwix | 2022-11-15
There are clearly similarities between the two, given that Kiwix put resources into making WARC content available in ZIM archives (i.e. Zimit-style ZIMs, created with the Zimit scraper and warc2zim backend). But as u/IMayBeABitShy said, the ZIM specification focuses on providing a highly compressed container that is readable on-the-fly (i.e. by decompressing only the needed content to show an article), whereas WARC, or rather the compressed version WACZ, is merely a zipped version of the WARC data (request headers and responses). It is also readable on-the-fly, but compression will not be as optimal as the zstandard compression used by modern ZIM archives.
Automated Python Script to retrieve vaccine slots availability and get notified when a slot is available.
Fast and robust date extraction from web pages, with Python or on the command-line
Script for checking changes in webpages
Archive a reddit user's post history. Formatted overview of a profile, JSON containing every post, and picture downloads. Uses the pushshift API.Project mention: How to download all the Threads i created? | reddit.com/r/redditdev | 2022-09-08
API for fetching data from news websites.
Fast flashcard searcherProject mention: FULL GUIDE FOR EDGENUITY | reddit.com/r/edgenuity | 2023-02-02
There's a program called searchifyx that scrapes brainly, quizziz, and Quizlet to find answers https://github.com/daijro/SearchifyX
Script to scrape comments (including name, profile link, pfp, designation, email(if present), and comment) from a LinkedIn post from the URL of the post.Project mention: Linkedin Comments Scraper - Script to scrape comments (including name, profile picture, designation, email(if present), and comment) from a LinkedIn post from the URL of the post. | reddit.com/r/webscraping | 2023-01-05
A Python 3 script that scrapes an html/xml page to extract text, then creates markdown files for Obsidian & the dataview pluginProject mention: I got the job! Now using Obsidian to help create databases for my company | reddit.com/r/ObsidianMD | 2023-03-25
Made a Github page last time I posted this: https://github.com/Flybell/web_to_obsidian
Extract data from all Google Scholar pages from a single Python module.Project mention: Scrape Google Scholar in R | dev.to | 2023-05-06
scrape-google-scholar-py is a open-source project of mine that aims to extract all the possible data from Google Scholar. In the future I'll port it to R.
A quick Craigslist API wrapper
BotCity Framework Web - Python
TestGPT | Generating meaningful tests for busy devs. Get non-trivial tests (and trivial, too!) suggested right inside your IDE, so you can code smart, create more value, and stay confident when you push.
Python Webscraping related posts
I have made a simple webscraper in python.pls checkout this github project.
1 project | reddit.com/r/madeinpython | 23 May 2023
1 project | reddit.com/r/stocks | 20 May 2023
1 project | reddit.com/r/options | 17 May 2023
I have an idea that would be somewhat cool to do with a sub of the most idiotic people ever.
1 project | reddit.com/r/Shortsqueeze | 7 May 2023
ChatGPT converted my trivia book into a daily Podcast
1 project | reddit.com/r/ChatGPTCoding | 7 May 2023
Scrape Google Scholar in R
2 projects | dev.to | 6 May 2023
ChatGPT Driven AI News Podcasts now rival the realism of actual news segments
1 project | reddit.com/r/tech | 29 Apr 2023
A note from our sponsor - CodiumAI
codium.ai | 30 May 2023
What are some of the best open-source Webscraping projects in Python? This list will help you:
|2||Webscraping Open Project||1,279|