Scrapy

Open-source projects categorized as Scrapy

Top 23 Scrapy Open-Source Projects

  • crawlab

    Distributed web crawler admin platform for spiders management regardless of languages and frameworks. 分布式爬虫管理平台,支持任何语言和框架

  • scrapy-redis

    Redis-based components for Scrapy.

  • Project mention: How to make scrapy run multiple times on the same URLs? | /r/scrapy | 2023-06-26
  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • Gerapy

    Distributed Crawler Management Framework Based on Scrapy, Scrapyd, Django and Vue.js

  • scrapy-splash

    Scrapy+Splash for JavaScript integration

  • scrapydweb

    Web app for Scrapyd cluster management, Scrapy log analysis & visualization, Auto packaging, Timer tasks, Monitor & Alert, and Mobile UI. DEMO :point_right:

  • SpiderKeeper

    admin ui for scrapy/open source scrapinghub

  • webscraping-from-0-to-hero

    The web scraping open project repository aims to share knowledge and experiences about web scraping with Python

  • Project mention: Web Scraping from 0 to hero – Sharing knowledge about web scraping on GH | news.ycombinator.com | 2023-07-06
  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  • advertools

    advertools - online marketing productivity and analysis tools

  • fakebrowser

    🤖 Fake fingerprints to bypass anti-bot systems. Simulate mouse and keyboard operations to make behavior like a real person.

  • kimuraframework

    Kimurai is a modern web scraping framework written in Ruby which works out of box with Headless Chromium/Firefox, PhantomJS, or simple HTTP requests and allows to scrape and interact with JavaScript rendered websites

  • scrapy-playwright

    🎭 Playwright integration for Scrapy

  • Project mention: Web Scraping Dynamic Websites With Scrapy Playwright | dev.to | 2024-03-06

    scrapy-playwright is an integration between Scrapy and Playwright. It enables scraping dynamic web pages with Scrapy by processing the web scraping requests using a Playwright instance.

  • scrapyrt

    HTTP API for Scrapy spiders

  • Data-Engineering-Projects

    Personal Data Engineering Projects

  • Project mention: Pitanje za data engineering? | /r/programiranje | 2023-06-30
  • scrapy-rotating-proxies

    use multiple proxies with Scrapy

  • scrapy-fake-useragent

    Random User-Agent middleware based on fake-useragent

  • domains

    World’s single largest Internet domains dataset

  • Project mention: There are only 2 .yahoo Internet domains | news.ycombinator.com | 2023-06-13
  • alltheplaces

    A set of spiders and scrapers to extract location information from places that post their location on the internet.

  • Project mention: Differentiating between hypermarkets and supermarkets. | /r/openstreetmap | 2023-12-09

    Maybe a different approach? https://www.alltheplaces.xyz/ has stores grouped by name

  • PHP Scraper

    A universal web-util for PHP.

  • Netflix-Clone

    Netflix like full-stack application with SPA client and backend implemented in service oriented architecture (by yuchiu)

  • tanakai

    Tanakai is a modern web scraping framework written in Ruby. A fork of Kimurai.

  • Project mention: Tanakai: Modern web scraping framework written in Ruby | news.ycombinator.com | 2023-10-25
  • awesome-web-scraper

    A collection of awesome web scaper, crawler.

  • estela

    estela, an elastic web scraping cluster 🕸

  • GoodreadsScraper

    Scrape data from Goodreads using Scrapy and Selenium :books:

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Scrapy related posts

  • Web Scraping Dynamic Websites With Scrapy Playwright

    1 project | dev.to | 6 Mar 2024
  • Differentiating between hypermarkets and supermarkets.

    1 project | /r/openstreetmap | 9 Dec 2023
  • Tanakai: Modern web scraping framework written in Ruby

    1 project | news.ycombinator.com | 25 Oct 2023
  • Meta, Microsoft and Amazon team up on maps project

    1 project | news.ycombinator.com | 26 Jul 2023
  • Distribution of gross and net salaries on r/BESalary [OC]

    1 project | /r/BESalary | 1 Jul 2023
  • How to make scrapy run multiple times on the same URLs?

    2 projects | /r/scrapy | 26 Jun 2023
  • There are only 2 .yahoo Internet domains

    1 project | news.ycombinator.com | 13 Jun 2023
  • A note from our sponsor - InfluxDB
    www.influxdata.com | 4 May 2024
    Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →

Index

What are some of the best open-source Scrapy projects? This list will help you:

Project Stars
1 crawlab 10,803
2 scrapy-redis 5,454
3 Gerapy 3,215
4 scrapy-splash 3,051
5 scrapydweb 3,004
6 SpiderKeeper 2,705
7 webscraping-from-0-to-hero 1,457
8 advertools 1,058
9 fakebrowser 1,048
10 kimuraframework 999
11 scrapy-playwright 837
12 scrapyrt 814
13 Data-Engineering-Projects 722
14 scrapy-rotating-proxies 705
15 scrapy-fake-useragent 681
16 domains 640
17 alltheplaces 559
18 PHP Scraper 497
19 Netflix-Clone 263
20 tanakai 260
21 awesome-web-scraper 237
22 estela 154
23 GoodreadsScraper 115

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com