Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →
Top 23 TypeScript Scraper Projects
-
crawlee
Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Puppeteer, Playwright, Cheerio, JSDOM, and raw HTTP. Both headful and headless mode. With proxy rotation.
-
SurveyJS
Open-Source JSON Form Builder to Create Dynamic Forms Right in Your App. With SurveyJS form UI libraries, you can build and style forms in a fully-integrated drag & drop form builder, render them in your JS app, and store form submission data in any backend, inc. PHP, ASP.NET Core, and Node.js.
-
scraper
Nodejs web scraper. Contains a command line, docker container, terraform module and ansible roles for distributed cloud scraping. Supported databases: SQLite, MySQL, PostgreSQL. Supported headless clients: Puppeteer, Playwright, Cheerio, JSdom. (by get-set-fetch)
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
freenom-auto-renew-domains
A scraper built with puppeteer that auto renew free domains on Freenom and send discord message using bot
-
passport-appointment-bot
Bot to automatically find and book appointment for renewal/creation of a Swedish passport or national identity card.
-
EZAI-Web-Scraper
An API that allows you to scrape blog posts and articles and get a list of notes or a summary back.
-
scraper
Declarative web scraper in JavaScript primarily designed to extract linguistics data (by sergeyt)
-
linguabook.github.io
Just-in-time scraper of linguistic information from different sources like Cambridge, Merriam-Webster. It also has a satellite Chrome Extension to see linguistic information for selected word while you surfing the internet
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
Cheerio is your ticket to the world of server-side magic, allowing you to manipulate HTML and XML documents with jQuery-like syntax. It’s perfect for web scraping, data extraction, or just making sense of the mess that is web content. With Cheerio, you get to play around with the DOM, use CSS selectors, and basically do all the cool things you'd do in the browser, but server-side.
In this guide, we'll be extracting information from Amazon product pages using the power of TypeScript in combination with the Cheerio and Crawlee libraries. We'll explore how to retrieve and extract detailed product data such as titles, prices, image URLs, and more from Amazon's vast marketplace. We'll also discuss handling potential blocking issues that may arise during the scraping process.
Project mention: Tutorial: Extracting structured data from websites using Groq and Firecrawl | news.ycombinator.com | 2024-04-22
You can also directly open a ticket at https://github.com/openzim/mwoffliner/issues with as much info as possible so we can look into it (zim name, language, date, article name, etc.)
Here's the repo if anyone wants to take a look: https://github.com/MaximumOverflow/Philia-GUI/releases
Hi guys, I've created an open-source low-code Node.js web scraping tool on top of the Puppeteer - https://github.com/miroshnikov/scrapyteer. It offers a small set of functions that are combined in pipelines to define a crawling workflow and a shape of output data. Maybe somebody will find it useful.
Project mention: How to track anything on the internet or use Playwright for fun and profit | dev.to | 2024-01-16To begin, all functionality related to browser automation and web scraping lives in a dedicated service — Web Scraper. The primary rationale is that dealing with browsers and arbitrary user scripts is tricky from a security standpoint, and it's always a good idea to isolate such functionality as much as possible. You can read more about the security aspects of web scraping in the "Running web scraping service securely" post.
Project mention: Show HN: LLM Scraper – turn any webpage into structured data | news.ycombinator.com | 2024-04-20
TypeScript Scraper related posts
- Tutorial: Extracting structured data from websites using Groq and Firecrawl
- How to scrape Amazon products
- Help regarding workflow
- What's the most advanced, best maintained, most fully featured web scraper for node.js
- Web Search and Scrape
- Linvo-Scraper: LinkedIn Automation Bot
- Linvo-Scraper: LinkedIn Automation Bot
-
A note from our sponsor - InfluxDB
www.influxdata.com | 26 Apr 2024
Index
What are some of the best open-source Scraper projects in TypeScript? This list will help you:
Project | Stars | |
---|---|---|
1 | cheerio | 27,780 |
2 | crawlee | 12,129 |
3 | firecrawl | 1,659 |
4 | linvo-scraper | 589 |
5 | HLTV | 374 |
6 | mwoffliner | 253 |
7 | scraper | 98 |
8 | extension | 58 |
9 | freenom-auto-renew-domains | 48 |
10 | vercel-metafy | 30 |
11 | passport-appointment-bot | 24 |
12 | Philia | 23 |
13 | scrapyteer | 16 |
14 | webscraper-bot | 13 |
15 | wallace-apple-dictionary | 10 |
16 | EZAI-Web-Scraper | 10 |
17 | scraper | 3 |
18 | YourArch | 2 |
19 | secutils-web-scraper | 1 |
20 | linguabook.github.io | 1 |
21 | llm-scraper | 1 |
22 | spinney | 0 |
23 | Favifetch | 0 |
Sponsored