puppeteer-extra
url-to-pdf-api
Our great sponsors
puppeteer-extra | url-to-pdf-api | |
---|---|---|
28 | 3 | |
6,031 | 6,969 | |
- | 0.2% | |
0.0 | 1.4 | |
15 days ago | 3 months ago | |
JavaScript | HTML | |
MIT License | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
puppeteer-extra
-
What are your favorite Data Scraping tools?
You could use https://github.com/berstend/puppeteer-extra/tree/master/packages/puppeteer-extra-plugin-stealth A plugin to escape anti bot detection
-
how can i bypasd 403 forbidden?
There is a good chance that the website is using Cloudflare to block web scrapers, which will require you to use a fortified headless browser to solve the JS challenges. Your options include the Puppeteer stealth plugin and Selenium undetected-chromedriver.
-
New headless Chrome has been released and has a near-perfect browser fingerprint
There are even Puppeteer plugins that will do it for you. [^1]
The best detection I've come across so far (i.e. before this release) has just required I run headless Chrome in headed mode. Granted, I don't do a ton of scraping -- mostly just pulling data out of websites so that I can play with it in aggregate using more civilized tools.
[1]: https://github.com/berstend/puppeteer-extra/tree/master/pack...
-
Proposed solution to twitter's ridiculous API pricing
You didn't know? https://github.com/berstend/puppeteer-extra/wiki/Block-resources-without-request-interception
- Using selenium with proxy still hit bot detection
-
Getting detected by Cloudflare for no apparent reason.
As for solutions, you are on point. Running a headless browser or using a web scraping API that does that for you (I work at one: https://scrapfly.io hi) is the easiest way to do it. Note that because of javascript fingerprinting you still need to fortify your headless browsers with various scripts like puppeteer-stealth.
-
100s of Spam Leads but not showing up in Google Analytics (UA) or Google Ads
Unfortunately, it's now trivial to bypass recaptcha: https://github.com/berstend/puppeteer-extra/tree/master/packages/puppeteer-extra-plugin-recaptcha
-
Perimeter X bypass help
Use a fortified headless browser like the stealth plugin for puppeteer.
- Spam on Unbounce Landers
- Puppeteer-extra-plugin-stealth – plugin for puppeteer-extra to prevent detection
url-to-pdf-api
-
Is it safe to put a potentially unsafe application in a docker container in a server?
I've been developing an application that uses url-to-pdf-api, which is basically an api for puppetteer js.
-
PDF generator for Vue that allows text selection and generates exactly as the render
What I do is I have a secret url that renders the pdf, then internally I use this to make the pdf. You can install it separately on your internal servers and just proxy the urls that generate the pdf.
-
Generating Pdf documents in React
Using this library, you can easily set up a microservice which will take the required URL as a query parameter, along with page size & various other customization options. You can find the library here
What are some alternatives?
puppeteer - Node.js API for Chrome
pdf2htmlEX - Convert PDF to HTML without losing text or format.
dark-knowledge - 😈📚 A curated library of research papers and presentations for counter-detection and web privacy enthusiasts.
pagedjs - Display paginated content in the browser and generate print books using web technology
fakebrowser - 🤖 Fake fingerprints to bypass anti-bot systems. Simulate mouse and keyboard operations to make behavior like a real person.
pyppeteer - Headless chrome/chromium automation library (unofficial port of puppeteer)
electron-store - Simple data persistence for your Electron app or module - Save and load user preferences, app state, cache, etc
excalibur - A web interface to extract tabular data from PDFs
puppeteer-instagram - Instagram automation driven by headless chrome.
simple-html-invoice-template - A modern, clean, and very simple responsive HTML invoice template
headless-recorder - Chrome extension that records your browser interactions and generates a Playwright or Puppeteer script.
puppeteer-dart - A Dart library to automate the Chrome browser over the DevTools Protocol. This is a port of the Puppeteer API