Top 23 Python Scrapy Projects

scrapy-redis

4 5,451 5.0 Python

Redis-based components for Scrapy.

Project mention: How to make scrapy run multiple times on the same URLs? | /r/scrapy | 2023-06-26

Gerapy

1 3,210 6.4 Python

Distributed Crawler Management Framework Based on Scrapy, Scrapyd, Django and Vue.js
InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
scrapy-splash

3 3,051 0.0 Python

Scrapy+Splash for JavaScript integration
scrapydweb

6 3,001 3.6 Python

Web app for Scrapyd cluster management, Scrapy log analysis & visualization, Auto packaging, Timer tasks, Monitor & Alert, and Mobile UI. DEMO :point_right:
SpiderKeeper

1 2,704 0.0 Python

admin ui for scrapy/open source scrapinghub
advertools

1 1,055 9.2 Python

advertools - online marketing productivity and analysis tools
scrapy-playwright

11 828 7.8 Python

🎭 Playwright integration for Scrapy

Project mention: Web Scraping Dynamic Websites With Scrapy Playwright | dev.to | 2024-03-06

scrapy-playwright is an integration between Scrapy and Playwright. It enables scraping dynamic web pages with Scrapy by processing the web scraping requests using a Playwright instance.

WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
scrapyrt

3 816 6.8 Python

HTTP API for Scrapy spiders
scrapy-rotating-proxies

4 705 0.0 Python

use multiple proxies with Scrapy
scrapy-fake-useragent

3 681 2.3 Python

Random User-Agent middleware based on fake-useragent
alltheplaces

6 528 10.0 Python

A set of spiders and scrapers to extract location information from places that post their location on the internet.

Project mention: Differentiating between hypermarkets and supermarkets. | /r/openstreetmap | 2023-12-09

Maybe a different approach? https://www.alltheplaces.xyz/ has stores grouped by name

estela

10 153 8.1 Python

estela, an elastic web scraping cluster 🕸
GoodreadsScraper

1 115 0.0 Python

Scrape data from Goodreads using Scrapy and Selenium :books:
scrapy-cloudflare-middleware

1 102 0.0 Python

A Scrapy middleware to bypass the CloudFlare's anti-bot protection
scrapy-crawl-once

1 77 0.0 Python

Scrapy middleware which allows to crawl only new content
open-gov-crawlers

13 61 7.1 Python

Parse government documents into well formed JSON
scrapy-mysql-pipeline

1 48 10.0 Python

scrapy mysql pipeline
scrapeops-scrapy-sdk

11 36 3.9 Python

Scrapy extension that gives you all the scraping monitoring, alerting, scheduling, and data validation you will need straight out of the box.

Project mention: Distribution of gross and net salaries on r/BESalary [OC] | /r/BESalary | 2023-07-01

My favourite scrapingtool is Scrappy, requires some Python knowledge but there are some very good tutorials about it on https://scrapeops.io

scrapingant-client-python

1 31 2.8 Python

ScrapingAnt API client for Python.
burplist

5 11 7.0 Python

Web crawler for Burplist, a search engine for craft beers in Singapore
hltv-scraping

1 10 0.0 Python

Scraping data from hltv.org
nse-stock-scraper

1 10 5.3 Python

This is Web Scraper utilizing Scrapy Framework, MongoDB and AfricasTalking to get stock prices for companies listed on the Nairobi Stock Exchange. This project will store ticker name and price as well notify via SMS once properly setup via AfricasTalking.
NSFW_Scraper

1 8 0.0 Python

Scraper to get Meta-data of all available scenes and movies and storing it to Postgresql every few days.
SaaSHub

www.saashub.com sponsored

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python Scrapy related posts

Web Scraping Dynamic Websites With Scrapy Playwright
1 project | dev.to | 6 Mar 2024
Differentiating between hypermarkets and supermarkets.
1 project | /r/openstreetmap | 9 Dec 2023
Meta, Microsoft and Amazon team up on maps project
1 project | news.ycombinator.com | 26 Jul 2023
Distribution of gross and net salaries on r/BESalary [OC]
1 project | /r/BESalary | 1 Jul 2023
How to make scrapy run multiple times on the same URLs?
2 projects | /r/scrapy | 26 Jun 2023
How do you handle CAPTCHA pages appearing in some of the rotating proxies you use?
1 project | /r/webscraping | 13 Apr 2023
Scrapy & splash guide
1 project | /r/learnpython | 18 Feb 2023
A note from our sponsor - WorkOS
workos.com | 25 Apr 2024

The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning. Learn more →

Index

What are some of the best open-source Scrapy projects in Python? This list will help you:

	Project	Stars
1	scrapy-redis	5,451
2	Gerapy	3,210
3	scrapy-splash	3,051
4	scrapydweb	3,001
5	SpiderKeeper	2,704
6	advertools	1,055
7	scrapy-playwright	828
8	scrapyrt	816
9	scrapy-rotating-proxies	705
10	scrapy-fake-useragent	681
11	alltheplaces	528
12	estela	153
13	GoodreadsScraper	115
14	scrapy-cloudflare-middleware	102
15	scrapy-crawl-once	77
16	open-gov-crawlers	61
17	scrapy-mysql-pipeline	48
18	scrapeops-scrapy-sdk	36
19	scrapingant-client-python	31
20	burplist	11
21	hltv-scraping	10
22	nse-stock-scraper	10
23	NSFW_Scraper	8