Puppeteer

Open-source projects categorized as Puppeteer

Top 23 Puppeteer Open-Source Projects

  • SingleFile

    Web Extension for saving a faithful copy of a complete web page in a single HTML file

  • Project mention: How SingleFile Transformed My Obsidian Workflow | news.ycombinator.com | 2024-01-26

    That's interesting. I have been saving articles as PDF files, which is browser-independent, but useful just for search and reference, a nuisance to quote/copy-and-paste.

    If I search only the computer, I don't get results from EBay and Amazon at the top. The idea of keeping the knowledge base separate from the primary notes is a good idea. In my case, that knowledge base is the file system, and the primary notes are whatever I choose.

    When I was using Evernote, the inbox was the knowledge base and notebooks were the focus. I just had too many different potential projects going on to manage this well.

    Looking to focus.

    I'll revisit Firefox and SingleFile.

    Explanation of the zip file inside.

    https://github.com/gildas-lormeau/SingleFile/blob/master/faq...

  • crawlee

    Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Puppeteer, Playwright, Cheerio, JSDOM, and raw HTTP. Both headful and headless mode. With proxy rotation.

  • Project mention: How to scrape Amazon products | dev.to | 2024-04-01

    In this guide, we'll be extracting information from Amazon product pages using the power of TypeScript in combination with the Cheerio and Crawlee libraries. We'll explore how to retrieve and extract detailed product data such as titles, prices, image URLs, and more from Amazon's vast marketplace. We'll also discuss handling potential blocking issues that may arise during the scraping process.

  • SurveyJS

    Open-Source JSON Form Builder to Create Dynamic Forms Right in Your App. With SurveyJS form UI libraries, you can build and style forms in a fully-integrated drag & drop form builder, render them in your JS app, and store form submission data in any backend, inc. PHP, ASP.NET Core, and Node.js.

    SurveyJS logo
  • browserless

    Deploy headless browsers in Docker. Run on our cloud or bring your own. Free for non-commercial uses.

  • Project mention: How and why we ripped our Open Source product apart for a full rebuild | dev.to | 2024-02-28

    The core product is managed, cloud hosted browsers. We run thousands at a time using AWS and DigitalOcean, for people to use with Puppeteer and Playwright scripts. Our container is also available to self deploy under an open-source license.

  • url-to-pdf-api

    Web page PDF/PNG rendering done right. Self-hosted service for rendering receipts, invoices, or any content.

  • gotenberg

    A developer-friendly API for converting numerous document formats into PDF files, and more!

  • Project mention: Create PDFs with Tailwind | dev.to | 2024-03-21

    Use a server-side headless browser such as puppeteer to convert the HTML to PDF. This is the most reliable free option, but requires a server. If you need to use it in production, we recommend you use Gotenberg.

  • puppeteer-extra

    💯 Teach puppeteer new tricks through plugins.

  • Project mention: What are your favorite Data Scraping tools? | /r/dataengineering | 2023-06-22

    You could use https://github.com/berstend/puppeteer-extra/tree/master/packages/puppeteer-extra-plugin-stealth A plugin to escape anti bot detection

  • venom

    Venom is a high-performance system developed with JavaScript to create a bot for WhatsApp, support for creating any interaction, such as customer service, media sending, sentence recognition based on artificial intelligence and all types of design architecture for WhatsApp.

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • FlareSolverr

    Proxy server to bypass Cloudflare protection

  • Project mention: Scraping Google trends, and incomplete datasets. Help, please? | /r/datasets | 2023-12-07

    What i didnt tried: - scraping and using these (single page) tokens - headless browser - web-test-frameworks like selenium (programmable browser) - using Flaresolver (my best bet) https://github.com/FlareSolverr/FlareSolverr . A headless browser / proxy developed to bypass cloudflare. You can easily deploy it onprem with docker. I know google got its own defence machanisms, but i've got very good experience using it for scraping and crawling (at least cloudflare protected) websites. So i guess its very good at pretending being a normal browser, being a normal user.

  • percollate

    A command-line tool to turn web pages into readable PDF, EPUB, HTML, or Markdown docs.

  • Project mention: The Case Against AI Everything, Everywhere, All at Once | news.ycombinator.com | 2023-10-19

    You can still choose automation. The easier route for me is to use wallabag to save the article. Then on my remarkable tablet I can grab a very readable document with https://github.com/koreader/koreader.

    The other option is to use https://github.com/danburzo/percollate to convert a webpage to a nice document directly. I use both tools depending on my needs.

  • browser-fingerprinting

    Analysis of Bot Protection systems with available countermeasures 🚿. How to defeat anti-bot system 👻 and get around browser fingerprinting scripts 🕵️‍♂️ when scraping the web?

  • Project mention: A site that tracks the price of a Big Mac in every US McDonald's | news.ycombinator.com | 2024-01-13

    Yes, there is a lot written about it. Here is one link I have saved:

    https://github.com/niespodd/browser-fingerprinting

  • unlighthouse

    Scan your entire site with Google Lighthouse in 2 minutes (on average). Open source, fully configurable with minimal setup.

  • Project mention: Audit your sites 10X faster with Unlighthouse | dev.to | 2023-09-19

    I encourage you to experiment with the Unlighthouse CLI to see how it can meet your specific needs. Here is the link to official docs.

  • jest-puppeteer

    Run tests using Jest & Puppeteer 🎪✨

  • pyppeteer

    Headless chrome/chromium automation library (unofficial port of puppeteer)

  • Project mention: Pyppeteer Tutorial: The Ultimate Guide to Using Puppeteer with Python | dev.to | 2024-02-05

    The latest version of Pyppeteer, i.e., 1.0.2, can also be installed by executing pip3 install -U git+https://github.com/pyppeteer/pyppeteer@dev on the terminal.

  • qawolf

    🐺 Create browser tests 10x faster

  • PuppeteerSharp

    Headless Chrome .NET API

  • Project mention: What do .NET devs use for web scraping these days? | /r/dotnet | 2023-06-13

    PuppeteerSharp

  • chrome-aws-lambda

    Chromium Binary for AWS Lambda and Google Cloud Functions

  • puppeteer-cluster

    Puppeteer Pool, run a cluster of instances in parallel

  • page-skeleton-webpack-plugin

    Webpack plugin to generate the skeleton page automatically

  • pwa-asset-generator

    Automates PWA asset generation and image declaration. Automatically generates icon and splash screen images, favicons and mstile images. Updates manifest.json and index.html files with the generated images according to Web App Manifest specs and Apple Human Interface guidelines.

  • Project mention: How To Generate Icons for a Progressive Web App from SVG File With a Single Command | dev.to | 2023-07-30

    To generate icons, we use pwa-asset-generator. The first command generates a favicon icon with a transparent background, the second one creates all the necessary icons for a progressive web app, and the third one creates images for splash screens. The last command is optional, in case you have an icon for dark mode.

  • penthouse

    Generate critical css for your web pages

  • awesome-puppeteer

    A curated list of awesome puppeteer resources.

  • free-games-claimer

    Automatically claims free games on the Epic Games Store, Amazon Prime Gaming and GOG.

  • Project mention: Is this github safe to use? | /r/antivirus | 2023-11-02

    GitHub - vogler/free-games-claimer: Automatically claims free games on the Epic Games Store, Amazon Prime Gaming and GOG.

  • Rendora

    dynamic server-side rendering using headless Chrome to effortlessly solve the SEO problem for modern javascript websites

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2024-04-01.

Puppeteer related posts

Index

What are some of the best open-source Puppeteer projects? This list will help you:

Project Stars
1 SingleFile 13,604
2 crawlee 12,044
3 browserless 7,842
4 url-to-pdf-api 6,969
5 gotenberg 6,693
6 puppeteer-extra 6,031
7 venom 5,699
8 FlareSolverr 5,608
9 percollate 4,103
10 browser-fingerprinting 3,830
11 unlighthouse 3,526
12 jest-puppeteer 3,519
13 pyppeteer 3,393
14 qawolf 3,273
15 PuppeteerSharp 3,149
16 chrome-aws-lambda 3,135
17 puppeteer-cluster 3,077
18 page-skeleton-webpack-plugin 2,780
19 pwa-asset-generator 2,626
20 penthouse 2,618
21 awesome-puppeteer 2,315
22 free-games-claimer 2,038
23 Rendora 1,992

Sponsored
Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com