puppeteer VS puppeteer-extra

Compare puppeteer vs puppeteer-extra and see what are their differences.

Our great sponsors
  • SurveyJS - Open-Source JSON Form Builder to Create Dynamic Forms Right in Your App
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
puppeteer puppeteer-extra
355 28
86,491 5,970
0.6% -
9.9 0.0
1 day ago 18 days ago
TypeScript JavaScript
Apache License 2.0 MIT License
The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

puppeteer

Posts with mentions or reviews of puppeteer. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2024-03-26.
  • HTML to PDF renderers: A simple comparison
    4 projects | dev.to | 26 Mar 2024
    HTML to PDF conversion is a common requirement in modern web applications. It allows users to save web pages, reports, and other content in a format that is easy to share and print. There are many libraries and services available for converting HTML to PDF, each with its own strengths and weaknesses. In this article, we will compare some of the most popular HTML to PDF renderers in Node.js, including Puppeteer, Playwright, node-html-pdf, and Onedoc.
  • Let's build a screenshot API
    8 projects | dev.to | 24 Mar 2024
    Playwright seems to be a superior library for working with headless browsers than Puppeteer, but I will go with Puppeteer.
  • JS Toolbox 2024: Bundlers and Test Frameworks
    10 projects | dev.to | 3 Mar 2024
    Puppeteer is a Node library that provides a high-level API to control headless Chrome or Chromium. It's primarily used for browser automation, making it a powerful tool for end-to-end testing of web applications, taking screenshots, and generating pre-rendered content from web pages.
  • Next.js 14 Booking App with Live Data Scraping using Scraping Browser
    3 projects | dev.to | 22 Feb 2024
    Puppeteer
  • Eleve o nível de suas Aplicações Javascript com Load Test
    2 projects | dev.to | 17 Feb 2024
    Website: pptr.dev Repositório: GitHub
  • Pyppeteer Tutorial: The Ultimate Guide to Using Puppeteer with Python
    5 projects | dev.to | 5 Feb 2024
    # Define variables PYTHON := python3 POETRY := poetry PYTEST := pytest PIP := pip3 PROJECT_NAME := web automation with Pyppeteer .PHONY: install install: $(POETRY) install @echo "Dependency installation complete" $(PIP) install -r requirements.txt @echo "Set env vars LT_USERNAME & LT_ACCESS_KEY" # Procure Username and AccessKey from https://accounts.lambdatest.com/security export LT_USERNAME=himansh export LT_ACCESS_KEY=Ia1MiqNfci .PHONY: install poetry-install: poetry install .PHONY: test test: export NODE_ENV = test .PHONY: test pyunit-pyppeteer: - echo $(EXEC_PLATFORM) - $(PYTHON) tests/pyunit-pyppeteer/test_pyunit_pyppeteer.py .PHONY: test pytest-pyppeteer: - echo $(EXEC_PLATFORM) - $(PYTEST) --verbose --capture=no -s -n 2 tests/pytest-pyppeteer/test_pytest_pyppeteer_1.py \ tests/pytest-pyppeteer/test_pytest_pyppeteer_2.py .PHONY: test pyunit-pyppeteer-browser-session: - echo $(EXEC_PLATFORM) - $(PYTHON) tests/starting-browser-session/pyunit/test_pyppeteer_browser_session.py .PHONY: test pytest-pyppeteer-browser-session: - echo $(EXEC_PLATFORM) - $(PYTEST) --verbose --capture=no -s \ tests/starting-browser-session/pytest/test_pyppeteer_browser_session.py .PHONY: test asyncio-run-pyppeteer-browser-session: - echo $(EXEC_PLATFORM) - $(PYTHON) tests/starting-browser-session/asyncio_run/test_pyppeteer_browser_session.py .PHONY: test asyncio-run-complete-pyppeteer-browser-session: - echo $(EXEC_PLATFORM) - $(PYTHON) tests/starting-browser-session/\ asyncio_run_until_complete/test_pyppeteer_browser_session.py .PHONY: test pyppeteer-button-click: - echo $(EXEC_PLATFORM) - $(PYTEST) --verbose --capture=no -s tests/button-click/test_page_class_click.py .PHONY: test pyppeteer-activate-tab: - echo $(EXEC_PLATFORM) - $(PYTEST) --verbose --capture=no -s tests/active-tab/test_page_class_bringtofront.py ###### Testing Custom Environment - https://miyakogi.github.io/pyppeteer/reference.html#environment-variables # Available versions: 113, 121, and default .PHONY: test pyppeteer-custom-chromium-version: - echo $(EXEC_PLATFORM) - echo 'Browser Version:' $(CHROMIUM_VERSION) - $(PYTEST) --verbose --capture=no -s tests/custom-configuration/test_launcher_exe_path.py ###### Testing Headless - https://miyakogi.github.io/pyppeteer/reference.html#launcher # Available values: headless and non-headless .PHONY: test pyppeteer-custom-browser-mode: - echo $(EXEC_PLATFORM) - echo $(BROWSER_MODE) - $(PYTEST) --verbose --capture=no -s tests/custom-configuration/test_launcher_headless.py .PHONY: test pyppeteer-generate-pdf: - echo $(EXEC_PLATFORM) - $(PYTEST) --verbose --capture=no -s tests/generate-pdf/test_page_class_pdf.py .PHONY: test pyppeteer-generate-screenshot: - echo $(EXEC_PLATFORM) - $(PYTEST) --verbose --capture=no -s tests/generate-screenshots/test_page_class_screenshot.py .PHONY: test pyppeteer-cookies: - echo $(EXEC_PLATFORM) - $(PYTEST) --verbose --capture=no -s tests/handling-cookies/test_page_class_cookies.py .PHONY: test pyppeteer-dialog-box: - echo $(EXEC_PLATFORM) - $(PYTEST) --verbose --capture=no -s tests/handling-dialog-box/test_handling_dialog_box.py .PHONY: test pyppeteer-iframe: - echo $(EXEC_PLATFORM) - $(PYTEST) --verbose --capture=no -s tests/handling-iframe/test_page_class_iframe.py # Like Puppeteer, Navigation operations mentioned below only work in Headless mode # goBack: https://miyakogi.github.io/pyppeteer/reference.html#pyppeteer.page.Page.goBack # goForward: https://miyakogi.github.io/pyppeteer/reference.html#pyppeteer.page.Page.goForward # Bug Link # https://github.com/puppeteer/puppeteer/issues/7739 # https://stackoverflow.com/questions/65540674/how-to-error-check-pyppeteer-page-goback .PHONY: test pyppeteer-navigate-ops: - echo $(EXEC_PLATFORM) - $(PYTEST) --verbose --capture=no -s tests/navigate-operations/test_page_class_navigation_ops.py .PHONY: test pyppeteer-request-response: - echo $(EXEC_PLATFORM) - $(PYTEST) --verbose --capture=no -s tests/request-response/test_page_class_req_resp.py .PHONY: test pyppeteer-viewport: - echo $(EXEC_PLATFORM) - echo $(BROWSER_MODE) - $(PYTEST) --verbose --capture=no -s tests/setting-useragent-viewports/\ test_page_class_useragent_viewport.py::test_mod_viewport .PHONY: test pyppeteer-non-headless-useragent: - echo $(EXEC_PLATFORM) - echo $(BROWSER_MODE) - $(PYTEST) --verbose --capture=no -s tests/setting-useragent-viewports/\ test_page_class_useragent_viewport.py::test_get_nonheadless_user_agent .PHONY: test pyppeteer-headless-useragent: - echo $(EXEC_PLATFORM) - echo $(BROWSER_MODE) - $(PYTEST) --verbose --capture=no -s tests/setting-useragent-viewports/\ test_page_class_useragent_viewport.py::test_get_headless_user_agent .PHONY: test pyppeteer-dynamic-content: - echo $(EXEC_PLATFORM) - echo $(BROWSER_MODE) - $(PYTEST) --verbose --capture=no -s -n 4 tests/handling-dynamic-content/\ test_page_class_lazy_loaded_content.py .PHONY: test pyppeteer-web-scraping: - echo $(EXEC_PLATFORM) - $(PYTEST) --verbose --capture=no -s tests/web-scraping-content/\ test_scraping_with_pyppeteer.py .PHONY: clean clean: # This helped: https://gist.github.com/hbsdev/a17deea814bc10197285 find . | grep -E "(__pycache__|\.pyc$$)" | xargs rm -rf rm -rf .pytest_cache/ @echo "Clean Succeeded" .PHONY: distclean distclean: clean rm -rf venv .PHONY: help help: @echo "" @echo "install : Install project dependencies" @echo "clean : Clean up temp files" @echo "pyunit-pyppeteer : Running Pyppeteer tests with Pyunit framework" @echo "pytest-pyppeteer : Running Pyppeteer tests with Pytest framework" @echo "pyunit-pyppeteer-browser-session : Browser session using Pyppeteer and Pyunit" @echo "pytest-pyppeteer-browser-session : Browser session using Pyppeteer and Pytest" @echo "asyncio-run-pyppeteer-browser-session : Browser session using Pyppeteer (Approach 1)" @echo "asyncio-run-complete-pyppeteer-browser-session : Browser session using Pyppeteer (Approach 2)" @echo "pyppeteer-button-click : Button click demo using Pyppeteer" @echo "pyppeteer-activate-tab : Switching browser tabs using Pyppeteer" @echo "pyppeteer-custom-chromium-version : Custom Chromium version with Pyppeteer" @echo "pyppeteer-custom-browser-mode : Headless and non-headless test execution with Pyppeteer" @echo "pyppeteer-generate-pdf : Generating pdf using Pyppeteer" @echo "pyppeteer-generate-screenshot : Generating page & element screenshots with Pyppeteer" @echo "pyppeteer-cookies : Customizing cookies with Pyppeteer" @echo "pyppeteer-dialog-box : Handling Dialog boxes with Pyppeteer" @echo "pyppeteer-iframe : Handling iFrames with Pyppeteer" @echo "pyppeteer-navigate-ops : Back & Forward browser operations with Pyppeteer" @echo "pyppeteer-request-response : Request and Response demonstration using Pyppeteer" @echo "pyppeteer-viewport : Customizing viewports using Pyppeteer" @echo "pyppeteer-non-headless-useragent : Customizing user-agent (with browser in headed mode) using Pyppeteer" @echo "pyppeteer-headless-useragent : Customizing user-agent (with browser in headless mode) using Pyppeteer" @echo "pyppeteer-dynamic-content : Handling dynamic web content using Pyppeteer" @echo "pyppeteer-web-scraping : Dynamic web scraping using Pyppeteer"
  • How to build a WhatsApp AI assistant
    7 projects | dev.to | 26 Jan 2024
    This library works by creating an instance of WhatsApp web running inside an instance of headless chrome automated by puppeteer. In my testing, I ran into tons of compatibility issues when trying to use these dependencies inside anything other than a bare-bones Node.js + express server. Also, we can’t spin up a new instance of chrome and WhatsApp web each time a user sends a message, this will exhaust our allowed WhatsApp connections (4 max), not to mention that doing this will make the response times painfully slow.
  • Show HN: Quetta – A privacy-first web browser with enhanced ad blocker inside
    2 projects | news.ycombinator.com | 18 Jan 2024
    It's a tricky balance to strike. Obviously, krono, you're right: ff we're being generous the app launch GA tracker they use, they probably consider it shares nothing of value or identifiable, but strictly speaking it's not true that no data shared.

    I think the audit you do is actually valuable. In my company's product, BrowserBox^0 (also a browser, funnily enough! focused on remote isolation, privacy & security, and including Tor support), we don't collect anything but we want to add some kind of telemetry regarding usage so we can even basically know how many daily users we have (outside of licensed channels where it's tracked in the contract).

    Even tho we don't collect anything, we don't advertise "Zero data collection" anywhere, because of how sensitive I think this topic is. I think we really need to be solid on it, if we're going to say that. And to cover us, in the privacy policy, we say "we may collect some data for operational purposes to ensure the continued running of the service" (paraphrasing), even tho we don't.

    One niggle is that we use Chrome in headless mode (can also use Edge / Chromium), and while we appreciate the auto updating quality that Chrome has for security patches, we're cautious that maybe Chrome still collects telemetry and sends it to Google even with headless and correct flags (see for instance: https://github.com/puppeteer/puppeteer/blob/afb7d9eb5854e851...)

    I know it's not your job...but what's your advice on how to proceed specifically?

    0: https://github.com/BrowserBox/BrowserBox

  • How to track anything on the internet or use Playwright for fun and profit
    5 projects | dev.to | 16 Jan 2024
    If you've read my previous blog posts or ever experimented with Secutils.dev, you might be familiar with the web resources tracking utility. This utility allows you to monitor changes in web page resources, specifically JavaScript and CSS. While it has a somewhat narrow security-focused purpose — detecting broken or tampered web application deployments — it may not be the type of tool you use daily. Nevertheless, it serves as a good example of what you can build with modern browser automation tools like Playwright and Puppeteer. If you're interested in digging deeper into this specific utility, refer to the following blog post series:
  • How to solve reCAPTCHA in Puppeteer using extension
    3 projects | dev.to | 9 Nov 2023
    Installing Puppeteer and other required packages: npm i puppeteer puppeteer-extra puppeteer-extra-plugin-stealth

puppeteer-extra

Posts with mentions or reviews of puppeteer-extra. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-03-09.
  • how can i bypasd 403 forbidden?
    3 projects | /r/webscraping | 9 Mar 2023
    There is a good chance that the website is using Cloudflare to block web scrapers, which will require you to use a fortified headless browser to solve the JS challenges. Your options include the Puppeteer stealth plugin and Selenium undetected-chromedriver.
  • New headless Chrome has been released and has a near-perfect browser fingerprint
    4 projects | news.ycombinator.com | 19 Feb 2023
    There are even Puppeteer plugins that will do it for you. [^1]

    The best detection I've come across so far (i.e. before this release) has just required I run headless Chrome in headed mode. Granted, I don't do a ton of scraping -- mostly just pulling data out of websites so that I can play with it in aggregate using more civilized tools.

    [1]: https://github.com/berstend/puppeteer-extra/tree/master/pack...

  • Using selenium with proxy still hit bot detection
    2 projects | /r/webscraping | 16 Jan 2023
  • Is there an easy way to tell if a website will allow scrapers or not?
    2 projects | /r/webscraping | 5 Oct 2022
    Fortified Headless Browser: Depending on the anti-bot protection the website is using you make need to use a fortified headless browser that can solve its JS challenges without giving its identity away. Your options include the Puppeteer stealth plugin and Selenium undetected-chromedriver.
  • Is Selenium still a good choice?
    3 projects | /r/webscraping | 13 Jul 2022
    That being said, if you're a beginner Selenium is a much more mature package so it has significantly more resources on StackOverflow and whatnot and Puppeteer has bigger community for avoiding web scraper detection (plugins like puppeteer-extra-plugin-stealth)
  • Show HN: Browser extension that spoofs your location data to match your VPN
    3 projects | news.ycombinator.com | 11 Jun 2022
    This extension takes an interesting approach to spoofing data, which is nice!

    In my case, I’m interested in doing the same thing inside of Puppeteer for web scraping, unfortunately it seems like the only possible approach is similar to content scripts (for example https://github.com/berstend/puppeteer-extra/tree/master/pack...) which leads to it being easily detected. Are there any similar approaches that can be used for Puppeteer?

  • Avoiding Bot Detection
    2 projects | /r/webscraping | 31 Jan 2022
    "I'm a noob and using python with selenium to do some basic scraping on StockX" and scraping protected website like stockx with perimeterx is not possible. It's all about reverse engineering, browser introspection, fingerprint (from hardware to software canvas), then you still need tons of ips to rotate and cooldown, finally protection evolve with time and you have to redo most of the things to pass again. A company like Scrapfly exists because it's more expensive to do and maintain such solution internally, look at their public repositories on GitHub low level stuff, network spoofing stacks, packet manipulation, custom angle libs. It takes a long time to learn vs something like `asp=true` from their docs https://scrapfly.io/docs/scrape-api/anti-scraping-protection If you have time and are more interested in this side, you could start to read https://github.com/prescience-data/dark-knowledge and look at https://github.com/berstend/puppeteer-extra/tree/master/packages/puppeteer-extra-plugin-stealth project to see how it works. Do not attempt stealth project helping you to bypass at scale, it's public, anti bot companies are aware and spot it easily - most of the time they don't block directly and use bad fp generated to recognize bots and map proxies ips to collect it and deducted the subnet or residential > My main question is, would it be better to try and make my script act "more human" It's a legend that anti bot use or detect "human" behavior, this signal is not very important, you can randomly move the mouse or things, like is fine, having 0 input events, is suspect but not that much in fact - tactile systems do not trigger any events until you touch so it can't be a strong signal due to false-positive - and doing "behavioral detection" is a big lie in the industry, you can experiment by doing dumb things, it's still passing and at scale ... and when they say "machine learning" it's just basic stats like a throttle do but based on browser fingerprints rather than IP. If you hit some path, like login, registration and payment - they can use some very heavy system with GPU canvas and stuff like but not used for scraping yet > are other methods like switching drivers and using proxies the way to go? Using proxies yes, but with wrong fingerprints (chrome headless, a browser running on server hardware, browser in docker and so on) In fact, there is no magic, mixing driver change nothing, they still manipulate a spotted browser - some are just more flexible than other to spoof correctly some part - like js worker interception to inject scripts and hook correctly but that's all.
  • How I met your...Scraper?
    7 projects | dev.to | 18 Jul 2021
    The page scraped for this post behaves "interesting", sometimes the reCaptcha is ignored, some others appear right after submitting the login, so randomly fails; I opened an issue in puppeteer-extra, an npm lib extension for puppeteer which works hand-to-hand with 2captcha, I'm watching the issue closely, in case of getting a fix for the random issue I'll edit the post.
  • Why mimicking a device is becoming almost impossible
    4 projects | news.ycombinator.com | 28 Jun 2021
    The stealth plugin for Puppeteer Extra gives a pretty good idea of what you need to cover today. Maybe it's not rocket science, but it's not trivial either.

    https://github.com/berstend/puppeteer-extra/tree/master/pack...

What are some alternatives?

When comparing puppeteer and puppeteer-extra you can also consider the following projects:

axios - Promise based HTTP client for the browser and node.js

Nightmare - A high-level browser automation library.

WKHTMLToPDF - Convert HTML to PDF using Webkit (QtWebKit)

Playwright - Playwright is a framework for Web Testing and Automation. It allows testing Chromium, Firefox and WebKit with a single API.

karma - Spectacular Test Runner for JavaScript

pyppeteer - Headless chrome/chromium automation library (unofficial port of puppeteer)

cheerio - The fast, flexible, and elegant library for parsing and manipulating HTML and XML.

Pdfkit - A Ruby gem to transform HTML + CSS into PDFs using the command-line utility wkhtmltopdf

phantomjs - Scriptable Headless Browser

Electron - :electron: Build cross-platform desktop apps with JavaScript, HTML, and CSS

request - 🏊🏾 Simplified HTTP request client.

dark-knowledge - 😈📚 A curated library of research papers and presentations for counter-detection and web privacy enthusiasts.