ts-emoji
crawlee
ts-emoji | crawlee | |
---|---|---|
1 | 32 | |
28 | 12,690 | |
- | 4.4% | |
4.9 | 9.8 | |
8 months ago | 6 days ago | |
TypeScript | TypeScript | |
MIT License | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
ts-emoji
crawlee
- Crawlee: Crawlee–build reliable crawlers. Works with Puppeteer, Playwright, Ch
-
Scrapy Vs. Crawlee
Crawlee is one of the few web scraping and automation libraries that supports JavaScript and TypeScript. Crawlee supports CLI just like Scrapy, but it also provides pre-built templates in TypeScript and JavaScript with support for Playwright and Puppeteer. These templates help beginners to quickly understand the file structure and how it works.
- Crawlee · Build reliable crawlers. Fast
-
How to scrape Amazon products
In this guide, we'll be extracting information from Amazon product pages using the power of TypeScript in combination with the Cheerio and Crawlee libraries. We'll explore how to retrieve and extract detailed product data such as titles, prices, image URLs, and more from Amazon's vast marketplace. We'll also discuss handling potential blocking issues that may arise during the scraping process.
-
Automating Data Collection with Apify: From Script to Deployment
Previously, the Apify SDK offered a blend of crawling functionalities and Actor building features. However, a recent update separated these functionalities into two distinct libraries: Crawlee and Apify SDK v3. Crawlee now houses the web scraping and crawling tools, while Apify SDK v3 focuses solely on features specific to building Actors for the Apify platform. This distinction allows for a clear separation of concerns and enhances the development experience for various use cases.
-
Launching Crawlee Blog: Your Node.js resource hub for web scraping and automation.
v3.1 added an error tracker for analyzing and summarizing failed requests.
-
Anything like scrapy in other languages?
Closest I found was https://crawlee.dev/ for Javascript/Typescript although still seems not on the level of scrapy. I didn't try it.
-
What is Playwright?
Also, you can go even further and develop your own web scraper with Crawlee, a Node.js library that helps you pass those challenges automatically using Puppeteer or Playwright. Crawlee helps you build reliable scrapers fast. Quickly scrape data, store it, and avoid getting blocked with headless browsers, smart proxy rotation, and auto-generated human-like headers and fingerprints.
-
Best web scraping framework to learn
https://crawlee.dev/ its very good, you can easily run your spiders in cloud with apify, and nodejs/puppeteer has many advantages than python/selenium
-
Deep diving into Apify world
Apify is a platform for web scraping that helps the developer starting from the coding, having developed its open-source NodeJs library for web scraping called Crawlee. Then on their platform, you can run and monitor the scrapers and also finally sell your scrapers in their store.
What are some alternatives?
NectarJS - 🔱 Javascript's God Mode. No VM. No Bytecode. No GC. Just native binaries.
awesome-puppeteer - A curated list of awesome puppeteer resources.
rdflib.js - Linked Data API for JavaScript
pwa-asset-generator - Automates PWA asset generation and image declaration. Automatically generates icon and splash screen images, favicons and mstile images. Updates manifest.json and index.html files with the generated images according to Web App Manifest specs and Apple Human Interface guidelines.
jirax - :sunglasses: :computer: Simple and flexible CLI Tool for your daily JIRA activity (supported on all OSes)
teachcode - A tool to develop and improve a student’s programming skills by introducing the earliest lessons of coding.
undetected-chromedriver - Custom Selenium Chromedriver | Zero-Config | Passes ALL bot mitigation systems (like Distil / Imperva/ Datadadome / CloudFlare IUAM)
firebase-signups-to-google-chat - Be notified of new signups in your app directly in Google Chat
zeit - Clock and task scheduler for node.js applications, providing extensive control of time and callback scheduling in prod and test code
vulcan-next - The Next starter for GraphQL developers
PrivMX JS Crypto Lib - Javascript crypto library ...
cheerio - The fast, flexible, and elegant library for parsing and manipulating HTML and XML.