How to scrape the web with Puppeteer in 2023

This page summarizes the projects mentioned and recommended in the original post on dev.to

Our great sponsors
  • SurveyJS - Open-Source JSON Form Builder to Create Dynamic Forms Right in Your App
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • puppeteer

    Node.js API for Chrome

  • Puppeteer is a browser automation library for JavaScript that uses the DevTools protocol to programmatically control Chromium or Chrome browsers. With more than 80K stars on GitHub, it is the de facto standard in headless browser automation. Puppeteer is written in TypeScript, which makes it easy to start with because of great code completion in IDEs.

  • Selenium WebDriver

    A browser automation framework and ecosystem.

  • Other libraries with similar functionality are Selenium, which is very popular outside the JavaScript world, and Playwright, a younger step-brother of Puppeteer.

  • SurveyJS

    Open-Source JSON Form Builder to Create Dynamic Forms Right in Your App. With SurveyJS form UI libraries, you can build and style forms in a fully-integrated drag & drop form builder, render them in your JS app, and store form submission data in any backend, inc. PHP, ASP.NET Core, and Node.js.

    SurveyJS logo
  • Playwright

    Playwright is a framework for Web Testing and Automation. It allows testing Chromium, Firefox and WebKit with a single API.

  • Other libraries with similar functionality are Selenium, which is very popular outside the JavaScript world, and Playwright, a younger step-brother of Puppeteer.

  • Vue.js

    This is the repo for Vue 2. For Vue 3, go to https://github.com/vuejs/core

  • { "user": "vuejs", "repo": "vue", "url": "https://github.com/vuejs/vue", "stars": 201555, "description": "🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.", "topics": [ "javascript", "framework", "vue", "frontend" ], "label": "repository", "commitCount": 3544 }

  • crawlee

    Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Puppeteer, Playwright, Cheerio, JSDOM, and raw HTTP. Both headful and headless mode. With proxy rotation.

  • Comfortable scraping and crawling with Puppeteer is better done together with another library. This library is called Crawlee, and it's also free and open-source, just like Puppeteer. Crawlee wraps Puppeteer and grants access to all of Puppeteer's functionality, but also provides useful crawling and scraping tools like error handling, queue management, storages, proxies or fingerprints out of the box.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts