Top 23 Python Scraping Projects
Scrapy, a fast high-level web crawling & scraping framework for Python.Project mention: Wanting to build a web scraper with no prior coding knowledge. Where do I start as fast as possible? | reddit.com/r/webscraping | 2021-10-23
Check out https://scrapy.org/ it’s a Python framework for web scraping and then look at https://youtube.com/c/JohnWatsonRooney channel to learn the syntax. Finally, go to https://www.zyte.com/scrapy-cloud/ to deploy your crawler to the cloud!
Pythonic HTML Parsing for Humans™Project mention: Requests html: Directly downloading pyppeteer chrome, not by script run | reddit.com/r/learnpython | 2021-08-18
This issue is asking for the same thing. Seems like they've implemented a simple fix in this Pull Request. But it looks like it never made it to the Master branch. Maybe you can extend the class and make necessary changes if you know what you're doing, otherwise you're out of luck.
Optimize your datasets for ML. Goodbye, boilerplate code - the fastest dataset optimization and management tool for computer vision.
A Smart, Automatic, Fast and Lightweight Web Scraper for PythonProject mention: Turn Any Website Into An API with AutoScraper and FastAPI | dev.to | 2021-04-24
In this article, we will learn how to create a simple e-commerce search API with multiple platform support: eBay and Amazon. AutoScraper and FastAPi provide the ability to create a powerful JSON API for the date. With Playwright's help, we'll extend our scraper and avoid blocking by using ScrapingAnt's web scraping API.
Custom Selenium Chromedriver | Zero-Config | Passes ALL bot mitigation systems (like Distil / Imperva/ Datadadome / CloudFlare IUAM)Project mention: Is there a way to use the 'real' chrome instead of chromedriver? | reddit.com/r/selenium | 2021-10-08
Scrape Facebook public pages without an API keyProject mention: Legality of scraper code on Github | reddit.com/r/webscraping | 2021-10-19
I see a lot of projects like this https://github.com/kevinzg/facebook-scraper, are they just YOLO-ing it, or is it perfectly in line with Github rules to have these types of things open and public?
Parsel lets you extract data from XML/HTML documents using XPath or CSS selectorsProject mention: How to Crawl the Web with Scrapy | news.ycombinator.com | 2021-09-13
🥫 The simple, fast, and modern web scraping libraryProject mention: Ask HN: What are some tools / libraries you built yourself? | news.ycombinator.com | 2021-05-16
I've been working on gazpacho  for last two years.
It's a general purpose web scraping library for Python that replaces BeautifulSoup + requests for most projects.
Just surpassed ~2K downloads every week!
Run Linux Software Faster and Safer than Linux with Unikernels.
Generate Free Edu Mail(s) within minutesProject mention: Ilpt Another Way Of Getting An Edu Mail | reddit.com/r/IllegalLifeProTips | 2021-02-17
Had the same "waiting" issue. Found a similar script here https://github.com/AmmeySaini/Edu-Mail-Generator which worked out well. The fake email generating part isn't automated though.
Lookyloo is a web interface that allows users to capture a website page and then display a tree of domains that call each other.Project mention: Lookyloo/lookyloo - Lookyloo is a web interface that allows users to capture a website page and then display a tree of domains that call each other | reddit.com/r/bag_o_news | 2021-05-03
📄 Python tool to turn Notion.so pages into lightweight, customizable static websitesProject mention: Build unlimited free-forever static sites without a single line of code | dev.to | 2021-04-04
As of now, the best free way is using loconotion which an open-source python tool written by Leonardo Cavaletti.
Scrapy Extension for monitoring spiders execution.Project mention: spidermon: Scrapy Extension for monitoring spiders execution | news.ycombinator.com | 2021-02-16
🤖 Scrape data from HTML websites automatically with Machine LearningProject mention: mlscraper: Scrape data from HTML pages automatically with Machine Learning | news.ycombinator.com | 2021-07-05
Web scraping library and command-line tool for text discovery and extraction (main content, metadata, comments)Project mention: What's something self hosted everyone needs to run ? | reddit.com/r/selfhosted | 2021-09-02
Example end to end data engineering project.Project mention: Is it me or are beginner-friendly ETL pipeline guides that explain from the ground-up how to incorporate the use of various technologies notoriously difficult to find. | reddit.com/r/dataengineering | 2021-07-23
scan for webcams on the internetProject mention: JettChenT/scan-for-webcams - scan for webcams on the internet | reddit.com/r/GithubSecurityTools | 2021-08-09
Python scraper for Language Pods such as Japanesepod101.com :japanese_ogre: :japan: :sushi: Compatible with Japanese, Chinese, French, German, Italian, Korean, Portuguese, Russian, Spanish and many more! ✨Project mention: Has anyone been through all five levels/pathways of JapanesePod101.com? | reddit.com/r/LearnJapanese | 2021-01-07
A python scraper to download everything: https://github.com/nedlir/languagepod101-scraper
arxiv_miner is a toolkit for mining research papers on CS ArXiv.Project mention: ArXiv_miner: A toolkit for mining research papers on CS ArXiv | reddit.com/r/CKsTechNews | 2021-05-29
The 4CAT Capture and Analysis Toolkit provides modular data capture & analysis for a variety of social media platforms.Project mention: What's a way to scrape all posts of a sub-reddit and make a dataset of it? | reddit.com/r/datasets | 2021-10-03
A headless pure-python browser for the webProject mention: Sunday Daily Thread: What's everyone working on this week? | reddit.com/r/Python | 2021-05-30
I dusted off an old web-scraping project (activesoup) that I wrote a few years ago and don't really use much myself anymore, but I think it gets a little bit of usage by others, since every few months I see a new star on github, or a small issue or feature request is filed. This week, it was a small feature request (with a PR too - great!).
📊 Python tool to scrape real-time information about ETFs from the web and mixing them together by proportionally distributing their assets allocationProject mention: Investing Advice: Are Stocks better than ETFs in Ireland? | reddit.com/r/irishpersonalfinance | 2021-06-10
Dataset and visualizations of Nintendo Games and ratings, scraped from metacritic.comProject mention: [OC] Critic to User Score Delta Analysis for Nintendo Games | reddit.com/r/dataisbeautiful | 2021-01-09
Scraping Facebook informationProject mention: #fbspider: Scraping información Facebook | reddit.com/r/u_esgeeks | 2021-02-12
What are some of the best open-source Scraping projects in Python? This list will help you:
Are you hiring? Post a new remote job listing for free.