Top 23 Python Scraping Projects

Scrapy

180 51,023 9.6 Python

Scrapy, a fast high-level web crawling & scraping framework for Python.

Project mention: Scrapy: A Fast and Powerful Scraping and Web Crawling Framework | news.ycombinator.com | 2024-02-16

requests-html

14 13,595 0.0 Python

Pythonic HTML Parsing for Humans™
InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
undetected-chromedriver

40 8,485 6.4 Python

Custom Selenium Chromedriver | Zero-Config | Passes ALL bot mitigation systems (like Distil / Imperva/ Datadadome / CloudFlare IUAM)

Project mention: ad_clicker premium - Google/Bing Ads Clicker | /r/IMadeThis | 2023-12-08

This command-line tool clicks ads for a certain query on Google/Bing search using undetected_chromedriver package. Supports proxy, running multiple simultaneous browsers, ad targeting/exclusion, and running in loop.

autoscraper

9 5,952 0.0 Python

A Smart, Automatic, Fast and Lightweight Web Scraper for Python
fake-useragent

1 3,481 8.9 Python

Up-to-date simple useragent faker with real world database
Automatic-Udemy-Course-Enroller-GET-PAID-UDEMY-COURSES-for-FREE

8 3,060 3.9 Python

Do you want to LEARN NEW STUFF for FREE? Don't worry, with the power of web-scraping and automation, this script will find the necessary Udemy coupons & enroll you for PAID UDEMY COURSES, ABSOLUTELY FREE!
trafilatura

13 2,898 8.7 Python

Python & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments

Project mention: Trafilatura: Python tool to gather text on the Web | news.ycombinator.com | 2023-08-14

The feature list answers that question pretty well: https://github.com/adbar/trafilatura#features
Basically: you could implement all of this on top of BeautifulSoup - polite crawling policies, sitemap and feed parsing, URL de-duplication, parallel processing, download queues, heuristics for extracting just the main article content, metadata extraction, language detection... but it would require writing an enormous amount of extra code.

SaaSHub

www.saashub.com featured

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
snoop

7 2,701 9.4 Python

Snoop — инструмент разведки на основе открытых данных (OSINT world)

Project mention: Osint update of the Snoop Project tool search for user by nickname | news.ycombinator.com | 2024-01-02

Scrapegraph-ai

3 2,388 9.8 Python

Python scraper based on AI

Project mention: This Week In Python | dev.to | 2024-05-10

Scrapegraph-ai – Python scraper based on AI

Grab

0 2,357 3.0 Python

Web Scraping Framework
facebook-scraper

13 2,192 3.7 Python

Scrape Facebook public pages without an API key

Project mention: scraping instagram without selenium | /r/webscraping | 2023-06-30

Afaik on Facebook there are no such APIs, only good old HTML parsing, check out this project for example https://github.com/kevinzg/facebook-scraper (most of the parsing code is here https://github.com/kevinzg/facebook-scraper/blob/master/facebook_scraper/extractors.py )

shot-scraper

16 1,537 7.1 Python

A command-line utility for taking automated screenshots of websites

Project mention: I want to create IMDB for Open source projects | news.ycombinator.com | 2024-04-15

I had one of these recently! https://github.com/simonw/shot-scraper/pull/133/files
They're /incredibly/ rare though.

cloudproxy

7 1,358 2.2 Python

Hide your scrapers IP behind the cloud. Provision proxy servers across different cloud providers to improve your scraping success.
mlscraper

10 1,231 0.6 Python

🤖 Scrape data from HTML websites automatically by just providing examples
parsel

5 1,085 6.5 Python

Parsel lets you extract data from XML/HTML documents using XPath or CSS selectors
DataEngineeringProject

5 985 0.0 Python

Example end to end data engineering project.
Scweet

5 969 0.0 Python

A simple and unlimited twitter scraper : scrape tweets, likes, retweets, following, followers, user info, images...

Project mention: Twitter api reaching rate limit. 5calls per 15 mins just to get user likes. | /r/learnprogramming | 2023-05-22

hmm,, do you know any good one? I found this one but it doesn't scrape a single tweet's likes and followers https://github.com/Altimis/Scweet

botasaurus

5 934 9.2 Python

The All in One Framework to build Awesome Scrapers.

Project mention: This Week In Python | dev.to | 2024-04-05

botasaurus – The All in One Framework to build Awesome Scrapers

loconotion

3 816 3.5 Python

📄 Python tool to turn Notion.so pages into lightweight, customizable static websites
Edu-Mail-Generator

1 788 0.0 Python

Generate Free Edu Mail(s) within minutes
gazpacho

1 731 3.2 Python

🥫 The simple, fast, and modern web scraping library
google-maps-scraper

3 743 7.4 Python

👋 HOLA 👋 HOLA 👋 HOLA ! ENJOY OUR GOOGLE MAPS SCRAPER 🚀 TO EFFORTLESSLY EXTRACT DATA SUCH AS NAMES, ADDRESSES, PHONE NUMBERS, REVIEWS, WEBSITES, AND RATINGS FROM GOOGLE MAPS WITH EASE! 🤖

Project mention: I create a google maps scraper, let me know your thoughts | /r/webscraping | 2023-07-06

My scrapers runs at 120 Listing per 10 Minutes. So yours is quite Fast. You can see my scraper at https://github.com/omkarcloud/google-maps-scraper. It is quite popular with 95 Stars.

lookyloo

2 655 9.6 Python

Lookyloo is a web interface that allows users to capture a website page and then display a tree of domains that call each other.
SaaSHub

www.saashub.com featured

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python Scraping related posts

Python Scraper Based on AI

1 project | news.ycombinator.com | 9 May 2024
2024-03-01 listening in on the neighborhood

5 projects | news.ycombinator.com | 2 Mar 2024
Scrapy: A Fast and Powerful Scraping and Web Crawling Framework

1 project | news.ycombinator.com | 16 Feb 2024
A command-line utility for taking automated screenshots of websites

1 project | news.ycombinator.com | 15 Dec 2023
Direction Of The Stock Market

1 project | /r/StockMarket | 6 Dec 2023
Web Scraping via JavaScript Runtime Heap Snapshots (2022)

1 project | news.ycombinator.com | 8 Aug 2023
I have a panic selling problem

1 project | /r/stocks | 7 Jul 2023
A note from our sponsor - InfluxDB
www.influxdata.com | 10 May 2024

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →

Index

What are some of the best open-source Scraping projects in Python? This list will help you:

	Project	Stars
1	Scrapy	51,023
2	requests-html	13,595
3	undetected-chromedriver	8,485
4	autoscraper	5,952
5	fake-useragent	3,481
6	Automatic-Udemy-Course-Enroller-GET-PAID-UDEMY-COURSES-for-FREE	3,060
7	trafilatura	2,898
8	snoop	2,701
9	Scrapegraph-ai	2,388
10	Grab	2,357
11	facebook-scraper	2,192
12	shot-scraper	1,537
13	cloudproxy	1,358
14	mlscraper	1,231
15	parsel	1,085
16	DataEngineeringProject	985
17	Scweet	969
18	botasaurus	934
19	loconotion	816
20	Edu-Mail-Generator	788
21	gazpacho	731
22	google-maps-scraper	743
23	lookyloo	655

Python Scraping

Top 23 Python Scraping Projects

Python Scraping related posts

Python Scraper Based on AI

2024-03-01 listening in on the neighborhood

Scrapy: A Fast and Powerful Scraping and Web Crawling Framework

A command-line utility for taking automated screenshots of websites

Direction Of The Stock Market

Web Scraping via JavaScript Runtime Heap Snapshots (2022)

I have a panic selling problem

Index