puppeteer-extra
scraper-template
Our great sponsors
puppeteer-extra | scraper-template | |
---|---|---|
28 | 1 | |
6,056 | 0 | |
- | - | |
0.0 | 0.0 | |
4 days ago | almost 3 years ago | |
JavaScript | JavaScript | |
MIT License | - |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
puppeteer-extra
-
What are your favorite Data Scraping tools?
You could use https://github.com/berstend/puppeteer-extra/tree/master/packages/puppeteer-extra-plugin-stealth A plugin to escape anti bot detection
-
how can i bypasd 403 forbidden?
There is a good chance that the website is using Cloudflare to block web scrapers, which will require you to use a fortified headless browser to solve the JS challenges. Your options include the Puppeteer stealth plugin and Selenium undetected-chromedriver.
-
New headless Chrome has been released and has a near-perfect browser fingerprint
There are even Puppeteer plugins that will do it for you. [^1]
The best detection I've come across so far (i.e. before this release) has just required I run headless Chrome in headed mode. Granted, I don't do a ton of scraping -- mostly just pulling data out of websites so that I can play with it in aggregate using more civilized tools.
[1]: https://github.com/berstend/puppeteer-extra/tree/master/pack...
-
Proposed solution to twitter's ridiculous API pricing
You didn't know? https://github.com/berstend/puppeteer-extra/wiki/Block-resources-without-request-interception
- Using selenium with proxy still hit bot detection
-
Getting detected by Cloudflare for no apparent reason.
As for solutions, you are on point. Running a headless browser or using a web scraping API that does that for you (I work at one: https://scrapfly.io hi) is the easiest way to do it. Note that because of javascript fingerprinting you still need to fortify your headless browsers with various scripts like puppeteer-stealth.
-
100s of Spam Leads but not showing up in Google Analytics (UA) or Google Ads
Unfortunately, it's now trivial to bypass recaptcha: https://github.com/berstend/puppeteer-extra/tree/master/packages/puppeteer-extra-plugin-recaptcha
-
Perimeter X bypass help
Use a fortified headless browser like the stealth plugin for puppeteer.
- Spam on Unbounce Landers
- Puppeteer-extra-plugin-stealth – plugin for puppeteer-extra to prevent detection
scraper-template
-
How I met your...Scraper?
Note: There is available also a template project on GitHub in case it could be useful and save you some time.
What are some alternatives?
puppeteer - Node.js API for Chrome
dark-knowledge - 😈📚 A curated library of research papers and presentations for counter-detection and web privacy enthusiasts.
fakebrowser - 🤖 Fake fingerprints to bypass anti-bot systems. Simulate mouse and keyboard operations to make behavior like a real person.
electron-store - Simple data persistence for your Electron app or module - Save and load user preferences, app state, cache, etc
puppeteer-instagram - Instagram automation driven by headless chrome.
headless-recorder - Chrome extension that records your browser interactions and generates a Playwright or Puppeteer script.
url-to-pdf-api - Web page PDF/PNG rendering done right. Self-hosted service for rendering receipts, invoices, or any content.
YouTubeShop - Youtube autolike and autosubs script
roam-research-private-api - Private API to enable API access for Roam Research. Now you can connect Roam to your other projects.
undetected-chromedriver - Custom Selenium Chromedriver | Zero-Config | Passes ALL bot mitigation systems (like Distil / Imperva/ Datadadome / CloudFlare IUAM)
frida - Clone this repo to build Frida