Scrapy

Scrapy, a fast high-level web crawling & scraping framework for Python. (by scrapy)

Stats

Basic Scrapy repo stats
15
40,335
9.3
6 days ago

scrapy/scrapy is an open source project licensed under GNU General Public License v3.0 or later which is an OSI approved license.

Scrapy Alternatives

Similar projects and alternatives to Scrapy

  • GitHub repo NumPy

    The fundamental package for scientific computing with Python.

  • GitHub repo Pandas

    Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more

  • GitHub repo Pytorch

    Tensors and Dynamic neural networks in Python with strong GPU acceleration

  • GitHub repo scikit-learn

    scikit-learn: machine learning in Python

  • GitHub repo matplotlib

    matplotlib: plotting with Python

  • GitHub repo Robot Framework

    Generic automation framework for acceptance testing and RPA

  • GitHub repo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • GitHub repo react-native

    A framework for building native apps with React.

  • GitHub repo Elasticsearch

    Free and Open, Distributed, RESTful Search Engine

  • GitHub repo Ansible

    Ansible is a radically simple IT automation platform that makes your applications and systems easier to deploy and maintain. Automate everything from code deployment to network configuration to cloud management, in a language that approaches plain English, using SSH, with no agents to install on remote systems. https://docs.ansible.com.

  • GitHub repo moment

    Parse, validate, manipulate, and display dates in javascript.

  • GitHub repo Babel (Formerly 6to5)

    🐠 Babel is a compiler for writing next generation JavaScript.

  • GitHub repo jest

    Delightful JavaScript Testing.

  • GitHub repo Selenium WebDriver

    A browser automation framework and ecosystem.

  • GitHub repo SaltStack

    Software to automate the management and configuration of any infrastructure or application at scale. Get access to the Salt software package repository here:

  • GitHub repo Sinatra

    Classy web-development dressed in a DSL (official / canonical repo)

  • GitHub repo faker

    A library for generating fake data such as names, addresses, and phone numbers. (by faker-ruby)

  • GitHub repo pytest

    The pytest framework makes it easy to write small tests, yet scales to support complex functional testing

  • GitHub repo phpMyAdmin

    A web interface for MySQL and MariaDB

  • GitHub repo JRuby

    JRuby, an implementation of Ruby on the JVM

NOTE: The number of mentions on this list indicates mentions on common posts. Hence, a higher number means a better Scrapy alternative or higher similarity.

Posts

Posts where Scrapy has been mentioned. We have used some of these posts to build our list of alternatives and similar projects - the last one was on 2021-04-16.
  • Why is Python popular despite being accused of being slow?
    I use it regularly for things like web scraping (Scrapy is a joy) and data manipulation. For instance just wrote some fairly complicated scripts for doing address matching to pair up a couple of UK datasets without a common identity field. Human-entered addresses are decidedly fuzzy so you end up with a lot of arbitrary rules and Python is just fast to develop against. I don't really care if the script takes a couple of hours to run on the full datasets (35 million addresses) as opposed to half that time in something else more of a pain to tweak around with.
  • Commission Free API for UK traders
    For Python see https://scrapy.org/
  • Webscraping ingatlanhoz
  • Running tests/test_pipeline_images.py
    reddit.com/r/scrapy | 2021-04-12
    I am trying to test some changes (adding a new test method) that I have made to the tests/test_pipeline_images.py. Tox result is the bellow one, and the only difference between the initial file (in Scrapy repository) and the changed one, is this line:
  • 5 Python Libraries You Need to Know
    dev.to | 2021-04-10
    Scrapy is a fast high-level screen scraping and web crawling framework, used to crawl websites and extract structured data from their pages. It can be used for a wide range of purposes, from data mining to monitoring and automated testing.
  • I would like to scrape all posts in a subreddit?
    reddit.com/r/hacking | 2021-04-08
  • Scrapy 2.5.0 is out!
    reddit.com/r/scrapy | 2021-04-06
    It might be a good combo with https://github.com/scrapy/scrapy/pull/5015
  • Overview Automation Request
    Personally I'd rather use Scrapy than the raw requests library. I don't think the data analysis part of this project would be that useful in this case. It's more about fetching the needed data and "joining" it across multiple sources.
  • Skipping data in html table help request
    reddit.com/r/scrapy | 2021-04-06
    As I said, with FEEDS you can pass the option directly, see https://github.com/scrapy/scrapy/blob/099fb6ead070e622cf1a8c1be14372661cf44803/tests/test_feedexport.py#L1287
  • Amazon Price Checker
    I'm not really sure what you mean with the adding to wish list. If the page is dynamically loaded, you can on the one hand check the network tab in the developer tools of your browser and see if you can work something out or use a web driver like selenium or a library requests-html. By the way, if you want to crawl a larger amount of pages, a web scraping framework like scrapy is better suited for the job than an HTML parser like BeautifulSoup.
  • [OC] Visualizing gender & ethnic disparities in the music review industry.
    Scrapy
  • Data Scraping Question
    reddit.com/r/rstats | 2021-03-25
    Sorry this doesn't answer the question, but if your goal is to get the player data to do something with, why not just use a package that already does this? Unless you're just trying to learn data scraping (in which case I would actually recommend scrapy and then move the data to R). But if you're trying to just get the data, this package is for you: https://www.rdocumentation.org/packages/nbastatR/versions/0.1.110202031
  • Top 10 Python Libraries
    dev.to | 2021-03-24
    Download the latest version of ScraPy and visit its GitHub repository to know more. \
    dev.to | 2021-03-24
    ScraPy is also a popular open-source Python library for large-scale web scraping by building crawling programs, also known as spiders. BeautifulSoup helps you scrape data from websites but not via CSV or API. ScraPy gathers structured data from the Web (contact info or URLs) and can be used to scrape data from APIs or Python machine learning models, data mining, information processing, and more.
  • Verified YesStyle Deals
    Yup. I'm using Python and Scrapy.