puppeteer VS cheerio

Compare puppeteer vs cheerio and see what are their differences.

Our great sponsors
  • SurveyJS - Open-Source JSON Form Builder to Create Dynamic Forms Right in Your App
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
puppeteer cheerio
356 50
86,628 27,715
0.4% 0.7%
9.9 9.7
6 days ago 5 days ago
TypeScript TypeScript
Apache License 2.0 MIT License
The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

puppeteer

Posts with mentions or reviews of puppeteer. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2024-04-04.
  • Learn Automated Testing At Home: A Beginner's Guide
    4 projects | dev.to | 4 Apr 2024
    1.Puppeteer: Puppeteer is a Node library that provides a high-level API to control headless Chrome or Chromium using the DevTools Protocol. Key Features: More control over Chrome. Enables web scraping. Allows taking screenshots and generating PDFs for UI testing. Measures load times through the Chrome Performance Analysis tool
  • HTML to PDF renderers: A simple comparison
    4 projects | dev.to | 26 Mar 2024
    HTML to PDF conversion is a common requirement in modern web applications. It allows users to save web pages, reports, and other content in a format that is easy to share and print. There are many libraries and services available for converting HTML to PDF, each with its own strengths and weaknesses. In this article, we will compare some of the most popular HTML to PDF renderers in Node.js, including Puppeteer, Playwright, node-html-pdf, and Onedoc.
  • Let's build a screenshot API
    8 projects | dev.to | 24 Mar 2024
    Playwright seems to be a superior library for working with headless browsers than Puppeteer, but I will go with Puppeteer.
  • JS Toolbox 2024: Bundlers and Test Frameworks
    10 projects | dev.to | 3 Mar 2024
    Puppeteer is a Node library that provides a high-level API to control headless Chrome or Chromium. It's primarily used for browser automation, making it a powerful tool for end-to-end testing of web applications, taking screenshots, and generating pre-rendered content from web pages.
  • Next.js 14 Booking App with Live Data Scraping using Scraping Browser
    3 projects | dev.to | 22 Feb 2024
    Puppeteer
  • Eleve o nível de suas Aplicações Javascript com Load Test
    2 projects | dev.to | 17 Feb 2024
    Website: pptr.dev Repositório: GitHub
  • Pyppeteer Tutorial: The Ultimate Guide to Using Puppeteer with Python
    5 projects | dev.to | 5 Feb 2024
    # Define variables PYTHON := python3 POETRY := poetry PYTEST := pytest PIP := pip3 PROJECT_NAME := web automation with Pyppeteer .PHONY: install install: $(POETRY) install @echo "Dependency installation complete" $(PIP) install -r requirements.txt @echo "Set env vars LT_USERNAME & LT_ACCESS_KEY" # Procure Username and AccessKey from https://accounts.lambdatest.com/security export LT_USERNAME=himansh export LT_ACCESS_KEY=Ia1MiqNfci .PHONY: install poetry-install: poetry install .PHONY: test test: export NODE_ENV = test .PHONY: test pyunit-pyppeteer: - echo $(EXEC_PLATFORM) - $(PYTHON) tests/pyunit-pyppeteer/test_pyunit_pyppeteer.py .PHONY: test pytest-pyppeteer: - echo $(EXEC_PLATFORM) - $(PYTEST) --verbose --capture=no -s -n 2 tests/pytest-pyppeteer/test_pytest_pyppeteer_1.py \ tests/pytest-pyppeteer/test_pytest_pyppeteer_2.py .PHONY: test pyunit-pyppeteer-browser-session: - echo $(EXEC_PLATFORM) - $(PYTHON) tests/starting-browser-session/pyunit/test_pyppeteer_browser_session.py .PHONY: test pytest-pyppeteer-browser-session: - echo $(EXEC_PLATFORM) - $(PYTEST) --verbose --capture=no -s \ tests/starting-browser-session/pytest/test_pyppeteer_browser_session.py .PHONY: test asyncio-run-pyppeteer-browser-session: - echo $(EXEC_PLATFORM) - $(PYTHON) tests/starting-browser-session/asyncio_run/test_pyppeteer_browser_session.py .PHONY: test asyncio-run-complete-pyppeteer-browser-session: - echo $(EXEC_PLATFORM) - $(PYTHON) tests/starting-browser-session/\ asyncio_run_until_complete/test_pyppeteer_browser_session.py .PHONY: test pyppeteer-button-click: - echo $(EXEC_PLATFORM) - $(PYTEST) --verbose --capture=no -s tests/button-click/test_page_class_click.py .PHONY: test pyppeteer-activate-tab: - echo $(EXEC_PLATFORM) - $(PYTEST) --verbose --capture=no -s tests/active-tab/test_page_class_bringtofront.py ###### Testing Custom Environment - https://miyakogi.github.io/pyppeteer/reference.html#environment-variables # Available versions: 113, 121, and default .PHONY: test pyppeteer-custom-chromium-version: - echo $(EXEC_PLATFORM) - echo 'Browser Version:' $(CHROMIUM_VERSION) - $(PYTEST) --verbose --capture=no -s tests/custom-configuration/test_launcher_exe_path.py ###### Testing Headless - https://miyakogi.github.io/pyppeteer/reference.html#launcher # Available values: headless and non-headless .PHONY: test pyppeteer-custom-browser-mode: - echo $(EXEC_PLATFORM) - echo $(BROWSER_MODE) - $(PYTEST) --verbose --capture=no -s tests/custom-configuration/test_launcher_headless.py .PHONY: test pyppeteer-generate-pdf: - echo $(EXEC_PLATFORM) - $(PYTEST) --verbose --capture=no -s tests/generate-pdf/test_page_class_pdf.py .PHONY: test pyppeteer-generate-screenshot: - echo $(EXEC_PLATFORM) - $(PYTEST) --verbose --capture=no -s tests/generate-screenshots/test_page_class_screenshot.py .PHONY: test pyppeteer-cookies: - echo $(EXEC_PLATFORM) - $(PYTEST) --verbose --capture=no -s tests/handling-cookies/test_page_class_cookies.py .PHONY: test pyppeteer-dialog-box: - echo $(EXEC_PLATFORM) - $(PYTEST) --verbose --capture=no -s tests/handling-dialog-box/test_handling_dialog_box.py .PHONY: test pyppeteer-iframe: - echo $(EXEC_PLATFORM) - $(PYTEST) --verbose --capture=no -s tests/handling-iframe/test_page_class_iframe.py # Like Puppeteer, Navigation operations mentioned below only work in Headless mode # goBack: https://miyakogi.github.io/pyppeteer/reference.html#pyppeteer.page.Page.goBack # goForward: https://miyakogi.github.io/pyppeteer/reference.html#pyppeteer.page.Page.goForward # Bug Link # https://github.com/puppeteer/puppeteer/issues/7739 # https://stackoverflow.com/questions/65540674/how-to-error-check-pyppeteer-page-goback .PHONY: test pyppeteer-navigate-ops: - echo $(EXEC_PLATFORM) - $(PYTEST) --verbose --capture=no -s tests/navigate-operations/test_page_class_navigation_ops.py .PHONY: test pyppeteer-request-response: - echo $(EXEC_PLATFORM) - $(PYTEST) --verbose --capture=no -s tests/request-response/test_page_class_req_resp.py .PHONY: test pyppeteer-viewport: - echo $(EXEC_PLATFORM) - echo $(BROWSER_MODE) - $(PYTEST) --verbose --capture=no -s tests/setting-useragent-viewports/\ test_page_class_useragent_viewport.py::test_mod_viewport .PHONY: test pyppeteer-non-headless-useragent: - echo $(EXEC_PLATFORM) - echo $(BROWSER_MODE) - $(PYTEST) --verbose --capture=no -s tests/setting-useragent-viewports/\ test_page_class_useragent_viewport.py::test_get_nonheadless_user_agent .PHONY: test pyppeteer-headless-useragent: - echo $(EXEC_PLATFORM) - echo $(BROWSER_MODE) - $(PYTEST) --verbose --capture=no -s tests/setting-useragent-viewports/\ test_page_class_useragent_viewport.py::test_get_headless_user_agent .PHONY: test pyppeteer-dynamic-content: - echo $(EXEC_PLATFORM) - echo $(BROWSER_MODE) - $(PYTEST) --verbose --capture=no -s -n 4 tests/handling-dynamic-content/\ test_page_class_lazy_loaded_content.py .PHONY: test pyppeteer-web-scraping: - echo $(EXEC_PLATFORM) - $(PYTEST) --verbose --capture=no -s tests/web-scraping-content/\ test_scraping_with_pyppeteer.py .PHONY: clean clean: # This helped: https://gist.github.com/hbsdev/a17deea814bc10197285 find . | grep -E "(__pycache__|\.pyc$$)" | xargs rm -rf rm -rf .pytest_cache/ @echo "Clean Succeeded" .PHONY: distclean distclean: clean rm -rf venv .PHONY: help help: @echo "" @echo "install : Install project dependencies" @echo "clean : Clean up temp files" @echo "pyunit-pyppeteer : Running Pyppeteer tests with Pyunit framework" @echo "pytest-pyppeteer : Running Pyppeteer tests with Pytest framework" @echo "pyunit-pyppeteer-browser-session : Browser session using Pyppeteer and Pyunit" @echo "pytest-pyppeteer-browser-session : Browser session using Pyppeteer and Pytest" @echo "asyncio-run-pyppeteer-browser-session : Browser session using Pyppeteer (Approach 1)" @echo "asyncio-run-complete-pyppeteer-browser-session : Browser session using Pyppeteer (Approach 2)" @echo "pyppeteer-button-click : Button click demo using Pyppeteer" @echo "pyppeteer-activate-tab : Switching browser tabs using Pyppeteer" @echo "pyppeteer-custom-chromium-version : Custom Chromium version with Pyppeteer" @echo "pyppeteer-custom-browser-mode : Headless and non-headless test execution with Pyppeteer" @echo "pyppeteer-generate-pdf : Generating pdf using Pyppeteer" @echo "pyppeteer-generate-screenshot : Generating page & element screenshots with Pyppeteer" @echo "pyppeteer-cookies : Customizing cookies with Pyppeteer" @echo "pyppeteer-dialog-box : Handling Dialog boxes with Pyppeteer" @echo "pyppeteer-iframe : Handling iFrames with Pyppeteer" @echo "pyppeteer-navigate-ops : Back & Forward browser operations with Pyppeteer" @echo "pyppeteer-request-response : Request and Response demonstration using Pyppeteer" @echo "pyppeteer-viewport : Customizing viewports using Pyppeteer" @echo "pyppeteer-non-headless-useragent : Customizing user-agent (with browser in headed mode) using Pyppeteer" @echo "pyppeteer-headless-useragent : Customizing user-agent (with browser in headless mode) using Pyppeteer" @echo "pyppeteer-dynamic-content : Handling dynamic web content using Pyppeteer" @echo "pyppeteer-web-scraping : Dynamic web scraping using Pyppeteer"
  • How to build a WhatsApp AI assistant
    7 projects | dev.to | 26 Jan 2024
    This library works by creating an instance of WhatsApp web running inside an instance of headless chrome automated by puppeteer. In my testing, I ran into tons of compatibility issues when trying to use these dependencies inside anything other than a bare-bones Node.js + express server. Also, we can’t spin up a new instance of chrome and WhatsApp web each time a user sends a message, this will exhaust our allowed WhatsApp connections (4 max), not to mention that doing this will make the response times painfully slow.
  • Show HN: Quetta – A privacy-first web browser with enhanced ad blocker inside
    2 projects | news.ycombinator.com | 18 Jan 2024
    It's a tricky balance to strike. Obviously, krono, you're right: ff we're being generous the app launch GA tracker they use, they probably consider it shares nothing of value or identifiable, but strictly speaking it's not true that no data shared.

    I think the audit you do is actually valuable. In my company's product, BrowserBox^0 (also a browser, funnily enough! focused on remote isolation, privacy & security, and including Tor support), we don't collect anything but we want to add some kind of telemetry regarding usage so we can even basically know how many daily users we have (outside of licensed channels where it's tracked in the contract).

    Even tho we don't collect anything, we don't advertise "Zero data collection" anywhere, because of how sensitive I think this topic is. I think we really need to be solid on it, if we're going to say that. And to cover us, in the privacy policy, we say "we may collect some data for operational purposes to ensure the continued running of the service" (paraphrasing), even tho we don't.

    One niggle is that we use Chrome in headless mode (can also use Edge / Chromium), and while we appreciate the auto updating quality that Chrome has for security patches, we're cautious that maybe Chrome still collects telemetry and sends it to Google even with headless and correct flags (see for instance: https://github.com/puppeteer/puppeteer/blob/afb7d9eb5854e851...)

    I know it's not your job...but what's your advice on how to proceed specifically?

    0: https://github.com/BrowserBox/BrowserBox

  • How to track anything on the internet or use Playwright for fun and profit
    5 projects | dev.to | 16 Jan 2024
    If you've read my previous blog posts or ever experimented with Secutils.dev, you might be familiar with the web resources tracking utility. This utility allows you to monitor changes in web page resources, specifically JavaScript and CSS. While it has a somewhat narrow security-focused purpose — detecting broken or tampered web application deployments — it may not be the type of tool you use daily. Nevertheless, it serves as a good example of what you can build with modern browser automation tools like Playwright and Puppeteer. If you're interested in digging deeper into this specific utility, refer to the following blog post series:

cheerio

Posts with mentions or reviews of cheerio. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2024-04-02.
  • 8 NPM Packages for JavaScript Beginners [2024][+tutorials]
    6 projects | dev.to | 2 Apr 2024
    Cheerio is your ticket to the world of server-side magic, allowing you to manipulate HTML and XML documents with jQuery-like syntax. It’s perfect for web scraping, data extraction, or just making sense of the mess that is web content. With Cheerio, you get to play around with the DOM, use CSS selectors, and basically do all the cool things you'd do in the browser, but server-side.
  • How to scrape Amazon products
    4 projects | dev.to | 1 Apr 2024
    In this guide, we'll be extracting information from Amazon product pages using the power of TypeScript in combination with the Cheerio and Crawlee libraries. We'll explore how to retrieve and extract detailed product data such as titles, prices, image URLs, and more from Amazon's vast marketplace. We'll also discuss handling potential blocking issues that may arise during the scraping process.
  • Htmlq: Like Jq, but for HTML
    2 projects | news.ycombinator.com | 19 Mar 2024
    Nice. I've used Cheerio for this in the past: https://github.com/cheeriojs/cheerio?tab=readme-ov-file#sele...
  • Automating Data Collection with Apify: From Script to Deployment
    4 projects | dev.to | 17 Mar 2024
    For this article, I will be using the TypeScript Starter template as shown in the screenshot above. This comes with Nodejs, Cheerio, Axios
  • Web Scraping in Python – The Complete Guide
    11 projects | news.ycombinator.com | 20 Feb 2024
    > I'm not sure why Python web scraping is so popular compared to Node.js web scraping

    Take this with a grain of salt, since I am fully cognizant that I'm the outlier in most of these conversations, but Scrapy is A++ the no-kidding best framework for this activity that has been created thus far. So, if there was scrapyjs maybe I'd look into it, but there's not (that I'm aware of) so here we are. This conversation often comes up in any such "well, I just use requests & ..." conversation and if one is happy with main.py and a bunch of requests invocations, I'm glad for you, but I don't want to try and cobble together all the side-band stuff that Scrapy and its ecosystem provide for me in a reusable and predictable way

    Also, often those conversations conflate the server side language with the "scrape using headed browser" language which happens to be the same one. So, if one is using cheerio <https://github.com/cheeriojs/cheerio> then sure node can be a fine thing - if the blog post is all "fire up puppeteer, what can go wrong?!" then there is the road to ruin of doing battle with all kinds of detection problems since it's kind of a browser but kind of not

    I, under no circumstances, want the target site running their JS during my crawl runs. I fully accept responsibility for reproducing any XHR or auth or whatever to find the 3 URLs that I care about, without downloading every thumbnail and marketing JS and beacon and and and. I'm also cognizant that my traffic will thus stand out since it uniquely does not make the beacon and marketing calls, but my experience has been that I get the ban hammer less often with my target fetches than trying to pretend to be a browser with a human on the keyboard/mouse but is not

  • Web Scraping in Node.js Using Axios,Cheerio and Json2csv
    3 projects | dev.to | 20 Nov 2023
    Web scraping is a powerful technique used to extract data from websites. In this tutorial, we'll explore how to perform web scraping using Node.js, Axios for making HTTP requests,Cheerio for parsing HTML content and also json2csv for converting json data to csv. We'll scrape product data from a sample website, "https://scrapeme.live/shop/".
  • Portadom: A Unified Interface for DOM Manipulation
    4 projects | dev.to | 30 Aug 2023
    Web scraping, while immensely useful, often requires developers to navigate a sea of tools and libraries, each with its own quirks and intricacies. Whether it's JSDOM, Cheerio, Playwright, or even just plain old vanilla JS in the DevTools console, moving between these platforms can be a challenge.
  • Querying parsed HTML in BigQuery
    4 projects | dev.to | 26 May 2023
    While looking for a way to implement capo.js in BigQuery to understand how pages in HTTP Archive are ordered, I came across the Cheerio library, which is a jQuery-like interface over an HTML parser.
  • JavaScript Web Crawler with Node.js: A Step-By-Step Tutorial
    3 projects | dev.to | 17 Apr 2023
    Cheerio is a JavaScript tool for parsing HTML and XML in Node.js. It provides APIs for traversing and manipulating the DOM of a webpage.
  • I have an idea for a project and I wanna know which resources are available for me
    2 projects | /r/rust | 16 Mar 2023
    [Cheerio](https://cheerio.js.org/) + Deno, feel like I am in browser...

What are some alternatives?

When comparing puppeteer and cheerio you can also consider the following projects:

jsdom - A JavaScript implementation of various web standards, for use with Node.js

axios - Promise based HTTP client for the browser and node.js

Nightmare - A high-level browser automation library.

WKHTMLToPDF - Convert HTML to PDF using Webkit (QtWebKit)

Playwright - Playwright is a framework for Web Testing and Automation. It allows testing Chromium, Firefox and WebKit with a single API.

puppeteer-extra - 💯 Teach puppeteer new tricks through plugins.

karma - Spectacular Test Runner for JavaScript

pyppeteer - Headless chrome/chromium automation library (unofficial port of puppeteer)

Pdfkit - A Ruby gem to transform HTML + CSS into PDFs using the command-line utility wkhtmltopdf

phantomjs - Scriptable Headless Browser

Electron - :electron: Build cross-platform desktop apps with JavaScript, HTML, and CSS

request - 🏊🏾 Simplified HTTP request client.