Python Scrapy

Open-source Python projects categorized as Scrapy

Top 23 Python Scrapy Projects

  • scrapy-redis

    Redis-based components for Scrapy.

  • Project mention: How to make scrapy run multiple times on the same URLs? | /r/scrapy | 2023-06-26
  • Scout Monitoring

    Free Django app performance insights with Scout Monitoring. Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.

    Scout Monitoring logo
  • Gerapy

    Distributed Crawler Management Framework Based on Scrapy, Scrapyd, Django and Vue.js

  • scrapy-splash

    Scrapy+Splash for JavaScript integration

  • scrapydweb

    Web app for Scrapyd cluster management, Scrapy log analysis & visualization, Auto packaging, Timer tasks, Monitor & Alert, and Mobile UI. DEMO :point_right:

  • SpiderKeeper

    admin ui for scrapy/open source scrapinghub

  • advertools

    advertools - online marketing productivity and analysis tools

  • scrapy-playwright

    🎭 Playwright integration for Scrapy

  • Project mention: Scrapy Vs. Crawlee | dev.to | 2024-05-15

    Scrapy does not support headless browsers natively, but it supports them with its plugin system, similarly it does not support scraping JavaScript rendered websites, but the plugin system makes this possible. One of the best examples is its Playwright plugin.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • scrapyrt

    HTTP API for Scrapy spiders

  • scrapy-rotating-proxies

    use multiple proxies with Scrapy

  • scrapy-fake-useragent

    Random User-Agent middleware based on fake-useragent

  • alltheplaces

    A set of spiders and scrapers to extract location information from places that post their location on the internet.

  • Project mention: Differentiating between hypermarkets and supermarkets. | /r/openstreetmap | 2023-12-09

    Maybe a different approach? https://www.alltheplaces.xyz/ has stores grouped by name

  • estela

    estela, an elastic web scraping cluster 🕸

  • GoodreadsScraper

    Scrape data from Goodreads using Scrapy and Selenium :books:

  • scrapy-cloudflare-middleware

    A Scrapy middleware to bypass the CloudFlare's anti-bot protection

  • scrapy-crawl-once

    Scrapy middleware which allows to crawl only new content

  • open-gov-crawlers

    Parse government documents into well formed JSON

  • scrapy-mysql-pipeline

    scrapy mysql pipeline

  • scrapeops-scrapy-sdk

    Scrapy extension that gives you all the scraping monitoring, alerting, scheduling, and data validation you will need straight out of the box.

  • Project mention: Distribution of gross and net salaries on r/BESalary [OC] | /r/BESalary | 2023-07-01

    My favourite scrapingtool is Scrappy, requires some Python knowledge but there are some very good tutorials about it on https://scrapeops.io

  • scrapingant-client-python

    ScrapingAnt API client for Python.

  • burplist

    Web crawler for Burplist, a search engine for craft beers in Singapore

  • hltv-scraping

    Scraping data from hltv.org

  • nse-stock-scraper

    This is Web Scraper utilizing Scrapy Framework, MongoDB and AfricasTalking to get stock prices for companies listed on the Nairobi Stock Exchange. This project will store ticker name and price as well notify via SMS once properly setup via AfricasTalking.

  • NSFW_Scraper

    Scraper to get Meta-data of all available scenes and movies and storing it to Postgresql every few days.

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python Scrapy discussion

Log in or Post with

Python Scrapy related posts

  • Web Scraping Dynamic Websites With Scrapy Playwright

    1 project | dev.to | 6 Mar 2024
  • Differentiating between hypermarkets and supermarkets.

    1 project | /r/openstreetmap | 9 Dec 2023
  • Meta, Microsoft and Amazon team up on maps project

    1 project | news.ycombinator.com | 26 Jul 2023
  • Distribution of gross and net salaries on r/BESalary [OC]

    1 project | /r/BESalary | 1 Jul 2023
  • How to make scrapy run multiple times on the same URLs?

    2 projects | /r/scrapy | 26 Jun 2023
  • How do you handle CAPTCHA pages appearing in some of the rotating proxies you use?

    1 project | /r/webscraping | 13 Apr 2023
  • Scrapy & splash guide

    1 project | /r/learnpython | 18 Feb 2023
  • A note from our sponsor - Scout Monitoring
    www.scoutapm.com | 21 Jun 2024
    Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today. Learn more →

Index

What are some of the best open-source Scrapy projects in Python? This list will help you:

Project Stars
1 scrapy-redis 5,480
2 Gerapy 3,241
3 scrapy-splash 3,086
4 scrapydweb 3,039
5 SpiderKeeper 2,705
6 advertools 1,078
7 scrapy-playwright 883
8 scrapyrt 820
9 scrapy-rotating-proxies 716
10 scrapy-fake-useragent 682
11 alltheplaces 582
12 estela 160
13 GoodreadsScraper 119
14 scrapy-cloudflare-middleware 103
15 scrapy-crawl-once 76
16 open-gov-crawlers 62
17 scrapy-mysql-pipeline 48
18 scrapeops-scrapy-sdk 37
19 scrapingant-client-python 32
20 burplist 12
21 hltv-scraping 10
22 nse-stock-scraper 10
23 NSFW_Scraper 8

Sponsored
Free Django app performance insights with Scout Monitoring
Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.
www.scoutapm.com