Add the SurveyJS white-label form builder to your JavaScript app (React/Angular/Vue3). Build dynamic JSON forms without coding. Fully customizable, works with any backend, perfect for data-heavy apps. Learn more. Learn more →
Top 23 TypeScript Scraper Projects
-
firecrawl
The Web Data API for AI - Turn entire websites into LLM-ready markdown or structured data 🔥
Just a few days ago, Eric - CEO of Firecrawl - announced that they were closing down their previous startup, Mendable in this article and Hassan was promoted to the Director of Developer Relations in this post, both of whom post sample applications they build on a daily basis. These recent posts are testament to the prolific impact of sample applications on the adoption of Firecrawl and Together.ai.
-
SurveyJS
JavaScript Form Builder with No-Code UI & Built-In JSON Schema Editor. Add the SurveyJS white-label form builder to your JavaScript app (React/Angular/Vue3). Build dynamic JSON forms without coding. Fully customizable, works with any backend, perfect for data-heavy apps. Learn more.
-
Project mention: JavaScript package manager - How to fix Cannot find module 'cheerio' error with Enzyme in Yarn 1 projects | dev.to | 2025-06-11
Cheerio 1.0.0 is incompatible with enzyme 3.11.0. #3987
-
crawlee
Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Puppeteer, Playwright, Cheerio, JSDOM, and raw HTTP. Both headful and headless mode. With proxy rotation.
If you're a fan of Playwright check out Crawlee [0]. I've used it for a few small projects and it's been faster for me to get what I've needed done.
[0] https://crawlee.dev/
-
maxun
Easiest no code web data extraction platform. Instantly turn any website into API or spreadsheet.
Project mention: 👽 Extract Thousands of Rows of Data Without Writing Code (Open Source) | dev.to | 2025-07-17Explore the project on GitHub: https://github.com/getmaxun/maxun
-
llm-scraper [1] does a decent job but it's still a bit fragile. The biggest problem I have is all the React CSS-in-JS libraries that use hashes in their class names, which the LLM isn't smart enough to ignore.
[1] https://github.com/mishushakov/llm-scraper
-
DevDocs
Completely free, private, UI based Tech Documentation MCP server. Designed for coders and software developers in mind. Easily integrate into Cursor, Windsurf, Cline, Roo Code, Claude Desktop App (by cyberagiinc)
Project mention: Show HN: We made an MCP Server so that Cursor can build anything from API Docs | news.ycombinator.com | 2025-03-24Looks cool, the only one similar I've seen so far that is similar is: https://github.com/cyberagiinc/DevDocs
But every-time I've tried to run DevDocs, I've had issues running it. Either the scraper or the MCP server fails to run.
-
api.consumet.org
A Modern Search Engine API for Anime, Movies/TVShows, Books, Light Novels, Manga, etc.
-
Civic Auth
Web2 & Web3 login in a simple SDK. Drop Civic Auth into your app with native TS/JS support. Email login, SSO options, embedded wallets, and full session management. Minimal config. Deploy in under 5 minutes.
-
将网站转化为Epub
-
-
-
Volunteer for Kiwix here (https://kiwix.org), we do a lot of offline Wikipedia stuff. I've personally worked on MWOffliner (https://github.com/openzim/mwoffliner) which scrapes MediaWikis, primarily Wikipedia.
We have apps for basically every platform. Our PWA even supports IE 11!
You can use the WP1 tool which I'm the primary maintainer of (https://wp1.openzim.org/#/selections/user) to create "selections" which let you have your own custom version of Wikipedia, using categories that you define, WikiProjects, or even custom SPARQL queries.
-
mkfd
RSS feed builder created with Bun🥖 and Hono🔥- builds from webpages, email folders, and REST API calls.
Project mention: Mkfd – RSS feed builder API created with Bun and Hono | news.ycombinator.com | 2024-11-17 -
scraper
Nodejs web scraper. Contains a command line, docker container, terraform module and ansible roles for distributed cloud scraping. Supported databases: SQLite, MySQL, PostgreSQL. Supported headless clients: Puppeteer, Playwright, Cheerio, JSdom. (by get-set-fetch)
-
-
-
freenom-auto-renew-domains
A scraper built with puppeteer that auto renew free domains on Freenom and send discord message using bot
-
-
-
-
passport-appointment-bot
An automated bot designed to seamlessly book appointments for the renewal or creation of Swedish passports or national ID cards.
-
-
-
-
Sevalla
Deploy and host your apps and databases, now with $50 credit! Sevalla is the PaaS you have been looking for! Advanced deployment pipelines, usage-based pricing, preview apps, templates, human support by developers, and much more!
TypeScript Scraper discussion
TypeScript Scraper related posts
-
👽 Extract Thousands of Rows of Data Without Writing Code (Open Source)
-
Why we started sampleapp.ai
-
Scraperr – A Self Hosted Webscraper
-
Show HN: Get structured website data with just a prompt
-
Show HN: Llms.txt Generator – Turn websites into a text file to feed to any LLM
-
Maxun: Open-Source No-Code Web Data Extraction Platform
-
Maxun: Open-Source No-Code Web Data Extraction Platform
-
A note from our sponsor - SurveyJS
surveyjs.io | 31 Aug 2025
Index
What are some of the best open-source Scraper projects in TypeScript? This list will help you:
# | Project | Stars |
---|---|---|
1 | firecrawl | 52,769 |
2 | cheerio | 29,714 |
3 | crawlee | 19,240 |
4 | maxun | 13,540 |
5 | llm-scraper | 5,974 |
6 | DevDocs | 1,868 |
7 | api.consumet.org | 1,470 |
8 | epublifier | 798 |
9 | linvo-scraper | 622 |
10 | HLTV | 453 |
11 | mwoffliner | 390 |
12 | mkfd | 186 |
13 | scraper | 114 |
14 | extension | 84 |
15 | vercel-metafy | 52 |
16 | freenom-auto-renew-domains | 50 |
17 | webscraper-bot | 29 |
18 | botasaurus-starter | 29 |
19 | Philia | 24 |
20 | passport-appointment-bot | 24 |
21 | scrapyteer | 19 |
22 | forward-proxy-manager | 13 |
23 | wallace-apple-dictionary | 11 |