-
crawlee
Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Puppeteer, Playwright, Cheerio, JSDOM, and raw HTTP. Both headful and headless mode. With proxy rotation.
-
SurveyJS
Open-Source JSON Form Builder to Create Dynamic Forms Right in Your App. With SurveyJS form UI libraries, you can build and style forms in a fully-integrated drag & drop form builder, render them in your JS app, and store form submission data in any backend, inc. PHP, ASP.NET Core, and Node.js.
Hey HN,
this is Jan, founder of Apify, a web scraping and automation platform. Drawing on our team's years of experience, today we're launching Crawlee [1], the web scraping and browser automation library for Node.js that's designed for the fastest development and maximum reliability in production.
For details, see the short video [2] or read the announcement blog post [3].
Main features:
- Supports headless browsers with Playwright or Puppeteer
- Supports raw HTTP crawling with Cheerio or JSDOM
- Automated parallelization and scaling of crawlers for best performance
- Avoids blocking using smart sessions, proxies, and browser fingerprints
- Simple management and persistence of queues of URLs to crawl
- Written completely in TypeScript for type safety and code autocompletion
- Comprehensive documentation, code examples, and tutorials
- Actively maintained and developed by Apify—we use it ourselves!
- Lively community on Discord
To get started, visit https://crawlee.dev or run the following command: npx crawlee create my-crawler
[1] https://crawlee.dev/
Hi there!
We dont have any benchmarks for Crawlee just yet, but we are working on those as we speak. We care deeply about bot detection, one of the features of Crawlee is generated fingerprints based on real browser data we gather - you can read more about it in the https://github.com/apify/fingerprint-suite repository, which is used under the hood in Crawlee.
Crawlee is and always will be open source. It originated from the Apify SDK (http://sdk.apify.com), which is a library to support development of so called Actors on the Apify Platform (http://apify.com) - so you can see it as a way for us to improve the experience of our customers. But you can use it anywhere you want, we provide ready to use Dockerfiles for each template.