Show HN: Crawlee – The web scraping and browser automation library for Node.js

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

SurveyJS - Open-Source JSON Form Builder to Create Dynamic Forms Right in Your App
With SurveyJS form UI libraries, you can build and style forms in a fully-integrated drag & drop form builder, render them in your JS app, and store form submission data in any backend, inc. PHP, ASP.NET Core, and Node.js.
surveyjs.io
featured
InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
  • crawlee

    Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Puppeteer, Playwright, Cheerio, JSDOM, and raw HTTP. Both headful and headless mode. With proxy rotation.

  • Hey HN,

    this is Jan, founder of Apify, a web scraping and automation platform. Drawing on our team's years of experience, today we're launching Crawlee [1], the web scraping and browser automation library for Node.js that's designed for the fastest development and maximum reliability in production.

    For details, see the short video [2] or read the announcement blog post [3].

    Main features:

    - Supports headless browsers with Playwright or Puppeteer

    - Supports raw HTTP crawling with Cheerio or JSDOM

    - Automated parallelization and scaling of crawlers for best performance

    - Avoids blocking using smart sessions, proxies, and browser fingerprints

    - Simple management and persistence of queues of URLs to crawl

    - Written completely in TypeScript for type safety and code autocompletion

    - Comprehensive documentation, code examples, and tutorials

    - Actively maintained and developed by Apify—we use it ourselves!

    - Lively community on Discord

    To get started, visit https://crawlee.dev or run the following command: npx crawlee create my-crawler

    [1] https://crawlee.dev/

  • fingerprint-suite

    Browser fingerprinting tools for anonymizing your scrapers. Developed by Apify.

  • Hi there!

    We dont have any benchmarks for Crawlee just yet, but we are working on those as we speak. We care deeply about bot detection, one of the features of Crawlee is generated fingerprints based on real browser data we gather - you can read more about it in the https://github.com/apify/fingerprint-suite repository, which is used under the hood in Crawlee.

    Crawlee is and always will be open source. It originated from the Apify SDK (http://sdk.apify.com), which is a library to support development of so called Actors on the Apify Platform (http://apify.com) - so you can see it as a way for us to improve the experience of our customers. But you can use it anywhere you want, we provide ready to use Dockerfiles for each template.

  • SurveyJS

    Open-Source JSON Form Builder to Create Dynamic Forms Right in Your App. With SurveyJS form UI libraries, you can build and style forms in a fully-integrated drag & drop form builder, render them in your JS app, and store form submission data in any backend, inc. PHP, ASP.NET Core, and Node.js.

    SurveyJS logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • Crawlee · Build reliable crawlers. Fast. | Crawlee

    2 projects | /r/node | 23 Aug 2022
  • Browser fingerprinting tools for anonymizing your scrapers

    1 project | /r/CKsTechNews | 6 Jul 2022
  • Build and run your Python web scrapers in the cloud with Apify SDK for Python

    2 projects | /r/webscraping | 14 Mar 2023
  • Convert XML into Hiccup in Clojure and ClojureScript

    3 projects | /r/Clojure | 10 Mar 2023
  • Release of Finder 3.0: the CSS selector generator

    3 projects | /r/javascript | 7 Mar 2023