-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
Now we need to modify scrapy's settings to allow it to work with playwright. Instructions can be found on playwright's GitHub page. We need to add settings for DOWNLOAD_HANDLERS and TWISTED_REACTOR. New settings that were added can be found between ###. This is what the settings file should look like:
import scrapy from scrapy_playwright.page import PageCoroutine class PwspiderSpider(scrapy.Spider): name = 'pwspider' def start_requests(self): yield scrapy.Request('https://twitter.com', meta=dict( playwright=True, playwright_include_page=True, playwright_page_coroutines=[ # This where we can implement scrolling if we want PageCoroutine( 'wait_for_selector', 'div#itemName') ] ) ) async def parse(self, response): for item in response.css('div.card'): yield { 'name': item.css('h3::text').get(), 'price': item.css('div.form-group label::text').get() }