TypeScript Puppeteer

Open-source TypeScript projects categorized as Puppeteer

Top 23 TypeScript Puppeteer Projects

  • crawlee

    Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Puppeteer, Playwright, Cheerio, JSDOM, and raw HTTP. Both headful and headless mode. With proxy rotation.

    Project mention: How to scrape Amazon products | dev.to | 2024-04-01

    In this guide, we'll be extracting information from Amazon product pages using the power of TypeScript in combination with the Cheerio and Crawlee libraries. We'll explore how to retrieve and extract detailed product data such as titles, prices, image URLs, and more from Amazon's vast marketplace. We'll also discuss handling potential blocking issues that may arise during the scraping process.

  • browserless

    Deploy headless browsers in Docker. Run on our cloud or bring your own. Free for non-commercial uses.

    Project mention: How and why we ripped our Open Source product apart for a full rebuild | dev.to | 2024-02-28

    The core product is managed, cloud hosted browsers. We run thousands at a time using AWS and DigitalOcean, for people to use with Puppeteer and Playwright scripts. Our container is also available to self deploy under an open-source license.

  • SurveyJS

    Open-Source JSON Form Builder to Create Dynamic Forms Right in Your App. With SurveyJS form UI libraries, you can build and style forms in a fully-integrated drag & drop form builder, render them in your JS app, and store form submission data in any backend, inc. PHP, ASP.NET Core, and Node.js.

  • jest-puppeteer

    Run tests using Jest & Puppeteer 🎪✨

  • qawolf

    🐺 Create browser tests 10x faster

  • chrome-aws-lambda

    Chromium Binary for AWS Lambda and Google Cloud Functions

  • puppeteer-cluster

    Puppeteer Pool, run a cluster of instances in parallel

  • pwa-asset-generator

    Automates PWA asset generation and image declaration. Automatically generates icon and splash screen images, favicons and mstile images. Updates manifest.json and index.html files with the generated images according to Web App Manifest specs and Apple Human Interface guidelines.

    Project mention: How To Generate Icons for a Progressive Web App from SVG File With a Single Command | dev.to | 2023-07-30

    To generate icons, we use pwa-asset-generator. The first command generates a favicon icon with a transparent background, the second one creates all the necessary icons for a progressive web app, and the third one creates images for splash screens. The last command is optional, in case you have an icon for dark mode.

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

  • Spearmint

    Testing, simplified. || An inclusive, accessibility-first GUI for generating clean, semantic Javascript tests in only a few clicks of a button. (by open-source-labs)

  • md-to-pdf

    Hackable CLI tool for converting Markdown files to PDF using Node.js and headless Chrome.

  • BotD

    Bot detection library that runs in the browser. Detects automation tools and frameworks. No server required, runs 100% on the client. MIT license, no usage restrictions.

    Project mention: Download numbers on crates.io too high? | /r/rust | 2023-05-31

    If the crates.io team wanted to go further they could employ some invasive methods to detect bots (usually it involves a JS library that does fingerprinting on the browser - something like BotD), but I'm not advocating for it. I don't think crates.io should collect more data, they should just perform better statistics on the data they already have.

  • replay

    Library that provides an API to replay and stringify recordings created using Chrome DevTools Recorder (by puppeteer)

  • x-crawl

    x-crawl is a flexible Node.js multifunctional crawler library. Flexible usage and numerous functions can help you quickly, safely, and stably crawl pages, interfaces, and files.

    Project mention: AI combined with Node.js x-crawl crawler | dev.to | 2024-04-10

    import { createXCrawlOpenAI } from 'x-crawl' const xCrawlOpenAIApp = createXCrawlOpenAI({ clientOptions: { apiKey: 'Your API Key' } }) xCrawlOpenAIApp.help('What is x-crawl').then((res) => { console.log(res) /* res: x-crawl is a flexible Node.js AI-assisted web crawling library. It offers powerful AI-assisted features that make web crawling more efficient, intelligent, and convenient. You can find more information and the source code on x-crawl's GitHub page: https://github.com/coder-hxl/x-crawl. */ }) xCrawlOpenAIApp .help('Three major things to note about crawlers') .then((res) => { console.log(res) /* res: There are several important aspects to consider when working with crawlers: 1. **Robots.txt:** It's important to respect the rules set in a website's robots.txt file. This file specifies which parts of a website can be crawled by search engines and other bots. Not following these rules can lead to your crawler being blocked or even legal issues. 2. **Crawl Delay:** It's a good practice to implement a crawl delay between your requests to a website. This helps to reduce the load on the server and also shows respect for the server resources. 3. **User-Agent:** Always set a descriptive User-Agent header for your crawler. This helps websites identify your crawler and allows them to contact you if there are any issues. Using a generic or misleading User-Agent can also lead to your crawler being blocked. By keeping these points in mind, you can ensure that your crawler operates efficiently and ethically. */ })

  • adblocker

    Efficient embeddable adblocker library

  • fingerprint-suite

    Browser fingerprinting tools for anonymizing your scrapers. Developed by Apify.

  • secret-agent

    The web scraper that's nearly impossible to block - now called @ulixee/hero

  • linvo-scraper

    Linkedin Automation Bot with every possible scraping! Valid for 2022 used by Linvo.io

  • deno-puppeteer

    A port of puppeteer running on Deno

  • Recorder

    A browser extension that generates Cypress, Playwright and Puppeteer test scripts from your interactions 🖱 ⌨ (by DeploySentinel)

  • Twitch-Drops-Bot

    A Node.js bot that will automatically watch Twitch streams and claim drop rewards.

  • puppeteer-ide-extension

    Standalone puppeteer playground in browser's developer tools.

  • puppeteer-report

    Convert HTML to PDF by Puppeteer with support of adding a custom header, footer, and page number

  • mugshot

    Framework independent visual testing library

  • phishim

    Easy red team phishing with Puppeteer

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2024-04-10.

TypeScript Puppeteer related posts

Index

What are some of the best open-source Puppeteer projects in TypeScript? This list will help you:

Project Stars
1 crawlee 11,948
2 browserless 7,288
3 jest-puppeteer 3,518
4 qawolf 3,273
5 chrome-aws-lambda 3,133
6 puppeteer-cluster 3,068
7 pwa-asset-generator 2,623
8 Spearmint 1,283
9 md-to-pdf 1,064
10 BotD 888
11 replay 870
12 x-crawl 815
13 adblocker 726
14 fingerprint-suite 684
15 secret-agent 633
16 linvo-scraper 586
17 deno-puppeteer 441
18 Recorder 380
19 Twitch-Drops-Bot 279
20 puppeteer-ide-extension 184
21 puppeteer-report 137
22 mugshot 135
23 phishim 125
Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com