-
playwright-pool
Demonstration on how to use async python to control multiple playwright browsers for web-scraping
But to summarize it - puppeteer and playwright are superior to Selenium. Mostly because they both have modern, async APIs. When it comes to API itself Playwright is a great choice, though it comes with a lot of default cruft (browser parameters etc) that make scrapers easier to identify. Async support is really important too as there's a lot of IO blocking in browser automation. With async API you can launch multiple asynchronous browser tabs and do something in one while the other is loading - which drastically speeds up web scraping. I published a short demo on github to illustrate this: playwright-pool if you want to learn more about async.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
That being said, if you're a beginner Selenium is a much more mature package so it has significantly more resources on StackOverflow and whatnot and Puppeteer has bigger community for avoiding web scraper detection (plugins like puppeteer-extra-plugin-stealth)
-
This concern should be lifted if you are a Scrapy lover. There is a Scrapy integration for playwright, that gives you a lot of freedom and lets you operate from a Scrapy spider.
Related posts
-
I built an open source Chrome/Firefox extension that generates Playwright/Puppeteer scripts straight from your browser interactions using React/Shadow DOM
-
Headless recorder is a Chrome extension that records your browser interactions and generates a Playwright or Puppeteer script.
-
Headless recorder is a Chrome extension that records your browser interactions and generates a Playwright or Puppeteer script.
-
Headless recorder is a Chrome extension that records your browser interactions and generates a Playwright or Puppeteer script.
-
Headless recorder is a Chrome extension that records your browser interactions and generates a Playwright or Puppeteer script.