Scraping Dynamic Javascript Websites with Scrapy and Scrapy-playwright

This page summarizes the projects mentioned and recommended in the original post on dev.to

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • scrapy-playwright

    🎭 Playwright integration for Scrapy

  • Now we need to modify scrapy's settings to allow it to work with playwright. Instructions can be found on playwright's GitHub page. We need to add settings for DOWNLOAD_HANDLERS and TWISTED_REACTOR. New settings that were added can be found between ###. This is what the settings file should look like:

  • murder

    Large scale server deploys using BitTorrent and the BitTornado library (by ervinb)

  • import scrapy from scrapy_playwright.page import PageCoroutine class PwspiderSpider(scrapy.Spider): name = 'pwspider' def start_requests(self): yield scrapy.Request('https://twitter.com', meta=dict( playwright=True, playwright_include_page=True, playwright_page_coroutines=[ # This where we can implement scrolling if we want PageCoroutine( 'wait_for_selector', 'div#itemName') ] ) ) async def parse(self, response): for item in response.css('div.card'): yield { 'name': item.css('h3::text').get(), 'price': item.css('div.form-group label::text').get() }

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • Web Scraping Dynamic Websites With Scrapy Playwright

    1 project | dev.to | 6 Mar 2024
  • Scrapy & splash guide

    1 project | /r/learnpython | 18 Feb 2023
  • Implementing a Selenium backend on a web app?

    1 project | /r/webscraping | 8 Oct 2022
  • Make an addition to scrapy_playwright source code

    1 project | /r/scrapy | 22 Feb 2022
  • Turning webpages into pdf

    2 projects | /r/learnpython | 6 Jul 2023