Top 23 Scrapy Open-Source Projects

crawlab

4 10,803 6.0 Go

Distributed web crawler admin platform for spiders management regardless of languages and frameworks. 分布式爬虫管理平台，支持任何语言和框架
scrapy-redis

4 5,454 5.0 Python

Redis-based components for Scrapy.

Project mention: How to make scrapy run multiple times on the same URLs? | /r/scrapy | 2023-06-26

InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
Gerapy

1 3,215 6.8 Python

Distributed Crawler Management Framework Based on Scrapy, Scrapyd, Django and Vue.js
scrapy-splash

3 3,051 0.0 Python

Scrapy+Splash for JavaScript integration
scrapydweb

6 3,004 3.6 Python

Web app for Scrapyd cluster management, Scrapy log analysis & visualization, Auto packaging, Timer tasks, Monitor & Alert, and Mobile UI. DEMO :point_right:
SpiderKeeper

1 2,705 0.0 Python

admin ui for scrapy/open source scrapinghub
webscraping-from-0-to-hero

1 1,457 5.8

The web scraping open project repository aims to share knowledge and experiences about web scraping with Python

Project mention: Web Scraping from 0 to hero – Sharing knowledge about web scraping on GH | news.ycombinator.com | 2023-07-06

SaaSHub

www.saashub.com featured

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
advertools

1 1,058 9.2 Python

advertools - online marketing productivity and analysis tools
fakebrowser

1 1,048 0.0 JavaScript

🤖 Fake fingerprints to bypass anti-bot systems. Simulate mouse and keyboard operations to make behavior like a real person.
kimuraframework

5 999 0.0 Ruby

Kimurai is a modern web scraping framework written in Ruby which works out of box with Headless Chromium/Firefox, PhantomJS, or simple HTTP requests and allows to scrape and interact with JavaScript rendered websites
scrapy-playwright

11 837 7.8 Python

🎭 Playwright integration for Scrapy

Project mention: Web Scraping Dynamic Websites With Scrapy Playwright | dev.to | 2024-03-06

scrapy-playwright is an integration between Scrapy and Playwright. It enables scraping dynamic web pages with Scrapy by processing the web scraping requests using a Playwright instance.

scrapyrt

3 814 6.8 Python

HTTP API for Scrapy spiders
Data-Engineering-Projects

2 722 10.0 Jupyter Notebook

Personal Data Engineering Projects

Project mention: Pitanje za data engineering? | /r/programiranje | 2023-06-30

scrapy-rotating-proxies

4 705 0.0 Python

use multiple proxies with Scrapy
scrapy-fake-useragent

3 681 2.3 Python

Random User-Agent middleware based on fake-useragent
domains

4 640 5.3 HTML

World’s single largest Internet domains dataset

Project mention: There are only 2 .yahoo Internet domains | news.ycombinator.com | 2023-06-13

alltheplaces

6 559 10.0 Python

A set of spiders and scrapers to extract location information from places that post their location on the internet.

Project mention: Differentiating between hypermarkets and supermarkets. | /r/openstreetmap | 2023-12-09

Maybe a different approach? https://www.alltheplaces.xyz/ has stores grouped by name

PHP Scraper

1 497 5.3 PHP

A universal web-util for PHP.
Netflix-Clone

1 263 10.0 JavaScript

Netflix like full-stack application with SPA client and backend implemented in service oriented architecture (by yuchiu)
tanakai

3 260 6.1 Ruby

Tanakai is a modern web scraping framework written in Ruby. A fork of Kimurai.

Project mention: Tanakai: Modern web scraping framework written in Ruby | news.ycombinator.com | 2023-10-25

awesome-web-scraper

0 237 4.3

A collection of awesome web scaper, crawler.
estela

10 154 8.1 Python

estela, an elastic web scraping cluster 🕸
GoodreadsScraper

1 115 0.0 Python

Scrape data from Goodreads using Scrapy and Selenium :books:
SaaSHub

www.saashub.com featured

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Scrapy related posts

Web Scraping Dynamic Websites With Scrapy Playwright

1 project | dev.to | 6 Mar 2024
Differentiating between hypermarkets and supermarkets.

1 project | /r/openstreetmap | 9 Dec 2023
Tanakai: Modern web scraping framework written in Ruby

1 project | news.ycombinator.com | 25 Oct 2023
Meta, Microsoft and Amazon team up on maps project

1 project | news.ycombinator.com | 26 Jul 2023
Distribution of gross and net salaries on r/BESalary [OC]

1 project | /r/BESalary | 1 Jul 2023
How to make scrapy run multiple times on the same URLs?

2 projects | /r/scrapy | 26 Jun 2023
There are only 2 .yahoo Internet domains

1 project | news.ycombinator.com | 13 Jun 2023
A note from our sponsor - InfluxDB
www.influxdata.com | 4 May 2024

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →

Index

What are some of the best open-source Scrapy projects? This list will help you:

	Project	Stars
1	crawlab	10,803
2	scrapy-redis	5,454
3	Gerapy	3,215
4	scrapy-splash	3,051
5	scrapydweb	3,004
6	SpiderKeeper	2,705
7	webscraping-from-0-to-hero	1,457
8	advertools	1,058
9	fakebrowser	1,048
10	kimuraframework	999
11	scrapy-playwright	837
12	scrapyrt	814
13	Data-Engineering-Projects	722
14	scrapy-rotating-proxies	705
15	scrapy-fake-useragent	681
16	domains	640
17	alltheplaces	559
18	PHP Scraper	497
19	Netflix-Clone	263
20	tanakai	260
21	awesome-web-scraper	237
22	estela	154
23	GoodreadsScraper	115

Scrapy

Top 23 Scrapy Open-Source Projects

Scrapy related posts

Web Scraping Dynamic Websites With Scrapy Playwright

Differentiating between hypermarkets and supermarkets.

Tanakai: Modern web scraping framework written in Ruby

Meta, Microsoft and Amazon team up on maps project

Distribution of gross and net salaries on r/BESalary [OC]

How to make scrapy run multiple times on the same URLs?

There are only 2 .yahoo Internet domains

Index