Top 23 Python Scraping Projects
Scrapy, a fast high-level web crawling & scraping framework for Python.Project mention: Legalität von Web scraping | reddit.com/r/de_EDV | 2022-01-22
Pythonic HTML Parsing for Humans™Project mention: How to make all https traffic in program go through a specific proxy? | reddit.com/r/learnpython | 2021-12-24
Deliver Cleaner and Safer Code - Right in Your IDE of Choice!. SonarLint is a free and open source IDE extension that identifies and catches bugs and vulnerabilities as you code, directly in the IDE. Install from your favorite IDE marketplace today.
A Smart, Automatic, Fast and Lightweight Web Scraper for PythonProject mention: Turn Any Website Into An API with AutoScraper and FastAPI | dev.to | 2021-04-24
In this article, we will learn how to create a simple e-commerce search API with multiple platform support: eBay and Amazon. AutoScraper and FastAPi provide the ability to create a powerful JSON API for the date. With Playwright's help, we'll extend our scraper and avoid blocking by using ScrapingAnt's web scraping API.
Custom Selenium Chromedriver | Zero-Config | Passes ALL bot mitigation systems (like Distil / Imperva/ Datadadome / CloudFlare IUAM)Project mention: attributeerror: module 'undetected_chromedriver' has no attribute 'install' | reddit.com/r/learnpython | 2021-12-28
Snoop — инструмент разведки на основе открытых данных (OSINT world) (by snooppr)Project mention: FOSS News International #2: November 8-145, 2021 | reddit.com/r/fossnews | 2021-11-15
Scrape Facebook public pages without an API keyProject mention: Reddit, Twitter and Instagram downloader. Grand update | reddit.com/r/DataHoarder | 2021-12-26
I have been using kevinzg/facebook-scraper for some time, but (1) it is just a library and would require some programming work; and (2) you WILL walk into Facebook's rate limit, and nothing is downloaded for older posts (download order is from newest to oldest).
Parsel lets you extract data from XML/HTML documents using XPath or CSS selectorsProject mention: How to Crawl the Web with Scrapy | news.ycombinator.com | 2021-09-13
Less time debugging, more time building. Scout APM allows you to find and fix performance issues with no hassle. Now with error monitoring and external services monitoring, Scout is a developer's best friend when it comes to application development.
🥫 The simple, fast, and modern web scraping libraryProject mention: Ask HN: What are some tools / libraries you built yourself? | news.ycombinator.com | 2021-05-16
I've been working on gazpacho  for last two years.
It's a general purpose web scraping library for Python that replaces BeautifulSoup + requests for most projects.
Just surpassed ~2K downloads every week!
Generate Free Edu Mail(s) within minutesProject mention: Ilpt Another Way Of Getting An Edu Mail | reddit.com/r/IllegalLifeProTips | 2021-02-17
Had the same "waiting" issue. Found a similar script here https://github.com/AmmeySaini/Edu-Mail-Generator which worked out well. The fake email generating part isn't automated though.
Lookyloo is a web interface that allows users to capture a website page and then display a tree of domains that call each other.Project mention: Lookyloo/lookyloo - Lookyloo is a web interface that allows users to capture a website page and then display a tree of domains that call each other | reddit.com/r/bag_o_news | 2021-05-03
📄 Python tool to turn Notion.so pages into lightweight, customizable static websitesProject mention: Build unlimited free-forever static sites without a single line of code | dev.to | 2021-04-04
As of now, the best free way is using loconotion which an open-source python tool written by Leonardo Cavaletti.
Scrapy Extension for monitoring spiders execution.Project mention: spidermon: Scrapy Extension for monitoring spiders execution | news.ycombinator.com | 2021-02-16
Web scraping library and command-line tool for text discovery and extraction (main content, metadata, comments)Project mention: Most universal way to grab relevant webpage content | reddit.com/r/webscraping | 2021-11-30
Check out this python package https://github.com/adbar/trafilatura it seems to do what you are looking for. You may need to use an automated browser like selenium to first load the content of the page if you want it to work on all sites, even those that use js
Example end to end data engineering project.Project mention: Is it me or are beginner-friendly ETL pipeline guides that explain from the ground-up how to incorporate the use of various technologies notoriously difficult to find. | reddit.com/r/dataengineering | 2021-07-23
🤖 Scrape data from HTML websites automatically with Machine LearningProject mention: mlscraper: Scrape data from HTML pages automatically with Machine Learning | news.ycombinator.com | 2021-07-05
Query language for efficient data extraction from WikipediaProject mention: WikipediaQL: Query language for efficient data extraction from Wikipedia (early | news.ycombinator.com | 2021-07-05
Lightweight package to query popular search engines and scrape for result titles, links and descriptions
A simple and unlimited twitter scraper : scape tweets, likes, retweets, following, followers, user info, images...Project mention: Scraping the entirety of a private Twitter account. | reddit.com/r/DataHoarder | 2021-11-05
Learn some Python and check this out: https://github.com/Altimis/Scweet
scan for webcams on the internetProject mention: JettChenT/scan-for-webcams - scan for webcams on the internet | reddit.com/r/GithubSecurityTools | 2021-08-09
Python scraper for Language Pods such as Japanesepod101.com :japanese_ogre: :japan: :sushi: Compatible with Japanese, Chinese, French, German, Italian, Korean, Portuguese, Russian, Spanish and many more! ✨
arxiv_miner is a toolkit for mining research papers on CS ArXiv.Project mention: ArXiv_miner: A toolkit for mining research papers on CS ArXiv | reddit.com/r/CKsTechNews | 2021-05-29
The 4CAT Capture and Analysis Toolkit provides modular data capture & analysis for a variety of social media platforms.Project mention: What's a way to scrape all posts of a sub-reddit and make a dataset of it? | reddit.com/r/datasets | 2021-10-03
Python Scraping related posts
Legalität von Web scraping
1 project | reddit.com/r/de_EDV | 22 Jan 2022
1 project | reddit.com/r/Debate | 17 Jan 2022
The State of Web Scraping 2022: The Good, the Bad, the Ugly
2 projects | news.ycombinator.com | 12 Jan 2022
A way to get around deemed disposable
2 projects | reddit.com/r/irishpersonalfinance | 5 Jan 2022
Feedback Request: Utilities for web scraping
2 projects | reddit.com/r/Python | 30 Dec 2021
Stop asking who arnav mehta is and if he is real
1 project | reddit.com/r/Debate | 28 Dec 2021
Top 13 Web scraping tools in 2022
1 project | reddit.com/r/u_digitally_rajat | 28 Dec 2021
What are some of the best open-source Scraping projects in Python? This list will help you:
|17||Search Engine Parser||304|
Are you hiring? Post a new remote job listing for free.