Top 22 Python Web Crawling Projects

Scrapy

180 50,763 9.7 Python

Scrapy, a fast high-level web crawling & scraping framework for Python.

Project mention: Scrapy: A Fast and Powerful Scraping and Web Crawling Framework | news.ycombinator.com | 2024-02-16
pyspider

0 16,310 0.0 Python

A Powerful Spider(Web Crawler) System in Python.
InfluxDB

www.influxdata.com
sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
requests-html

14 13,574 0.0 Python

Pythonic HTML Parsing for Humans™
portia

0 9,159 0.0 Python

Visual scraping for Scrapy
MechanicalSoup

4 4,545 5.9 Python

A Python library for automating interaction with websites.

Project mention: How to scrape a website with Python (Beginner tutorial) | dev.to | 2024-02-22

MechanicalSoup is a Python library for web scraping that combines the simplicity of Requests with the convenience of BeautifulSoup. It's particularly useful for interacting with web forms, like login pages. Here's a basic example to illustrate how you can use MechanicalSoup for web scraping:
RoboBrowser

0 3,689 0.0 Python
Grab

0 2,353 3.0 Python

Web Scraping Framework
WorkOS

workos.com
sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
gain

0 2,031 0.0 Python

Web crawling framework based on asyncio.
feedparser

6 1,823 7.7 Python

Parse feeds in Python

Project mention: RSS can be used to distribute all sorts of information | news.ycombinator.com | 2023-11-20

There is JSON Feed¹ already. One of the spec writers is behind micro.blog, which is the first place I saw it(and also one of the few places I've seen it). I don't think it is a bad idea, and it doesn't take all that long to implement it.
I have long hoped it would pick up with the JSON-ify everything crowd, just so I'd never see a non-Atom feed again. We perhaps wouldn't need sooo much of the magic that is wrapped up in packages like feedparser² to deal with all the brokeness of RSS in the wild then.
¹ https://www.jsonfeed.org/
² https://github.com/kurtmckee/feedparser
PSpider

0 1,811 0.0 Python

简单易用的Python爬虫框架，QQ交流群：597510560
cola

0 1,485 0.0 Python

A high-level distributed crawling framework.
Sukhoi

0 878 0.0 Python

Minimalist and powerful Web Crawler.
google-search-results-python

4 514 4.5 Python

Google Search Results via SERP API pip Python Package

Project mention: Make Direct Async Requests to SerpApi with Python | dev.to | 2023-05-24

In this blog post we'll cover on how to make direct requests to serpapi.com/search.json without using SerpApi's google-search-results Python client.
MSpider

0 345 0.0 Python

Spider
spidy Web Crawler

0 322 0.0 Python

The simple, easy to use command line web crawler.
Crawley

0 182 0.0 Python

Pythonic Crawling / Scraping Framework based on Non Blocking I/O operations.
brownant

0 157 0.0 Python

Brownant is a web data extracting framework.
Demiurge

0 110 0.0 Python

PyQuery-based scraping micro-framework.
Pomp

0 60 0.0 Python

Screen scraping and web crawling framework
FastImage

0 28 0.0 Python

Python library that finds the size / type of an image given its URI by fetching as little as needed (by bmuller)
microwler

4 13 1.7 Python

A micro-framework for asynchronous deep crawls and web scraping with Python
Mariner

0 2 0.0 Python

This a is mirror of Gitlab repository. Open your issues and pull requests there. (by radek-sprta)
SaaSHub

www.saashub.com
sponsored

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2024-02-22.

Python Web Crawling related posts

How to scrape a website with Python (Beginner tutorial)
1 project | dev.to | 22 Feb 2024
Scrapy: A Fast and Powerful Scraping and Web Crawling Framework
1 project | news.ycombinator.com | 16 Feb 2024
Seven Python Projects to Elevate Your Coding Skills
3 projects | dev.to | 15 Feb 2024
What is SERP? Meaning, Use Cases and Approaches
3 projects | dev.to | 11 Dec 2023
Help! trying to use scraping for my dissertation but I am clueless
1 project | /r/webscraping | 6 Jul 2023
Turning webpages into pdf
2 projects | /r/learnpython | 6 Jul 2023
Implementing case sensitive headers in Scrapy (not through `_caseMappings`)
4 projects | /r/scrapy | 3 Jul 2023
A note from our sponsor - InfluxDB
www.influxdata.com | 18 Apr 2024

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →

Index

What are some of the best open-source Web Crawling projects in Python? This list will help you:

	Project	Stars
1	Scrapy	50,763
2	pyspider	16,310
3	requests-html	13,574
4	portia	9,159
5	MechanicalSoup	4,545
6	RoboBrowser	3,689
7	Grab	2,353
8	gain	2,031
9	feedparser	1,823
10	PSpider	1,811
11	cola	1,485
12	Sukhoi	878
13	google-search-results-python	514
14	MSpider	345
15	spidy Web Crawler	322
16	Crawley	182
17	brownant	157
18	Demiurge	110
19	Pomp	60
20	FastImage	28
21	microwler	13
22	Mariner	2