Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →
Top 23 Python Scraping Projects
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
undetected-chromedriver
Custom Selenium Chromedriver | Zero-Config | Passes ALL bot mitigation systems (like Distil / Imperva/ Datadadome / CloudFlare IUAM)
-
Automatic-Udemy-Course-Enroller-GET-PAID-UDEMY-COURSES-for-FREE
Do you want to LEARN NEW STUFF for FREE? Don't worry, with the power of web-scraping and automation, this script will find the necessary Udemy coupons & enroll you for PAID UDEMY COURSES, ABSOLUTELY FREE!
-
trafilatura
Python & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
cloudproxy
Hide your scrapers IP behind the cloud. Provision proxy servers across different cloud providers to improve your scraping success.
-
Scweet
A simple and unlimited twitter scraper : scrape tweets, likes, retweets, following, followers, user info, images...
-
google-maps-scraper
👋 HOLA 👋 HOLA 👋 HOLA ! ENJOY OUR GOOGLE MAPS SCRAPER 🚀 TO EFFORTLESSLY EXTRACT DATA SUCH AS NAMES, ADDRESSES, PHONE NUMBERS, REVIEWS, WEBSITES, AND RATINGS FROM GOOGLE MAPS WITH EASE! 🤖
-
lookyloo
Lookyloo is a web interface that allows users to capture a website page and then display a tree of domains that call each other.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Project mention: Scrapy: A Fast and Powerful Scraping and Web Crawling Framework | news.ycombinator.com | 2024-02-16
This command-line tool clicks ads for a certain query on Google/Bing search using undetected_chromedriver package. Supports proxy, running multiple simultaneous browsers, ad targeting/exclusion, and running in loop.
Project mention: Trafilatura: Python tool to gather text on the Web | news.ycombinator.com | 2023-08-14The feature list answers that question pretty well: https://github.com/adbar/trafilatura#features
Basically: you could implement all of this on top of BeautifulSoup - polite crawling policies, sitemap and feed parsing, URL de-duplication, parallel processing, download queues, heuristics for extracting just the main article content, metadata extraction, language detection... but it would require writing an enormous amount of extra code.
Project mention: Osint update of the Snoop Project tool search for user by nickname | news.ycombinator.com | 2024-01-02
Scrapegraph-ai – Python scraper based on AI
Afaik on Facebook there are no such APIs, only good old HTML parsing, check out this project for example https://github.com/kevinzg/facebook-scraper (most of the parsing code is here https://github.com/kevinzg/facebook-scraper/blob/master/facebook_scraper/extractors.py )
I had one of these recently! https://github.com/simonw/shot-scraper/pull/133/files
They're /incredibly/ rare though.
Project mention: Twitter api reaching rate limit. 5calls per 15 mins just to get user likes. | /r/learnprogramming | 2023-05-22hmm,, do you know any good one? I found this one but it doesn't scrape a single tweet's likes and followers https://github.com/Altimis/Scweet
botasaurus – The All in One Framework to build Awesome Scrapers
Project mention: I create a google maps scraper, let me know your thoughts | /r/webscraping | 2023-07-06My scrapers runs at 120 Listing per 10 Minutes. So yours is quite Fast. You can see my scraper at https://github.com/omkarcloud/google-maps-scraper. It is quite popular with 95 Stars.
Python Scraping related posts
-
Python Scraper Based on AI
-
2024-03-01 listening in on the neighborhood
-
Scrapy: A Fast and Powerful Scraping and Web Crawling Framework
-
A command-line utility for taking automated screenshots of websites
-
Direction Of The Stock Market
-
Web Scraping via JavaScript Runtime Heap Snapshots (2022)
-
I have a panic selling problem
-
A note from our sponsor - InfluxDB
www.influxdata.com | 10 May 2024
Index
What are some of the best open-source Scraping projects in Python? This list will help you:
Project | Stars | |
---|---|---|
1 | Scrapy | 51,023 |
2 | requests-html | 13,595 |
3 | undetected-chromedriver | 8,485 |
4 | autoscraper | 5,952 |
5 | fake-useragent | 3,481 |
6 | Automatic-Udemy-Course-Enroller-GET-PAID-UDEMY-COURSES-for-FREE | 3,060 |
7 | trafilatura | 2,898 |
8 | snoop | 2,701 |
9 | Scrapegraph-ai | 2,388 |
10 | Grab | 2,357 |
11 | facebook-scraper | 2,192 |
12 | shot-scraper | 1,537 |
13 | cloudproxy | 1,358 |
14 | mlscraper | 1,231 |
15 | parsel | 1,085 |
16 | DataEngineeringProject | 985 |
17 | Scweet | 969 |
18 | botasaurus | 934 |
19 | loconotion | 816 |
20 | Edu-Mail-Generator | 788 |
21 | gazpacho | 731 |
22 | google-maps-scraper | 743 |
23 | lookyloo | 655 |
Sponsored