TypeScript Scraper

Open-source TypeScript projects categorized as Scraper

Top 23 TypeScript Scraper Projects

  1. firecrawl

    The Web Data API for AI - Turn entire websites into LLM-ready markdown or structured data 🔥

    Project mention: Why we started sampleapp.ai | dev.to | 2025-06-23

    Just a few days ago, Eric - CEO of Firecrawl - announced that they were closing down their previous startup, Mendable in this article and Hassan was promoted to the Director of Developer Relations in this post, both of whom post sample applications they build on a daily basis. These recent posts are testament to the prolific impact of sample applications on the adoption of Firecrawl and Together.ai.

  2. SurveyJS

    JavaScript Form Builder with No-Code UI & Built-In JSON Schema Editor. Add the SurveyJS white-label form builder to your JavaScript app (React/Angular/Vue3). Build dynamic JSON forms without coding. Fully customizable, works with any backend, perfect for data-heavy apps. Learn more.

    SurveyJS logo
  3. cheerio

    The fast, flexible, and elegant library for parsing and manipulating HTML and XML.

    Project mention: JavaScript package manager - How to fix Cannot find module 'cheerio' error with Enzyme in Yarn 1 projects | dev.to | 2025-06-11

    Cheerio 1.0.0 is incompatible with enzyme 3.11.0. #3987

  4. crawlee

    Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Puppeteer, Playwright, Cheerio, JSDOM, and raw HTTP. Both headful and headless mode. With proxy rotation.

    Project mention: Scraperr – A Self Hosted Webscraper | news.ycombinator.com | 2025-05-11

    If you're a fan of Playwright check out Crawlee [0]. I've used it for a few small projects and it's been faster for me to get what I've needed done.

    [0] https://crawlee.dev/

  5. maxun

    Easiest no code web data extraction platform. Instantly turn any website into API or spreadsheet.

    Project mention: 👽 Extract Thousands of Rows of Data Without Writing Code (Open Source) | dev.to | 2025-07-17

    Explore the project on GitHub: https://github.com/getmaxun/maxun

  6. llm-scraper

    Turn any webpage into structured data using LLMs

    Project mention: Scraperr – A Self Hosted Webscraper | news.ycombinator.com | 2025-05-11

    llm-scraper [1] does a decent job but it's still a bit fragile. The biggest problem I have is all the React CSS-in-JS libraries that use hashes in their class names, which the LLM isn't smart enough to ignore.

    [1] https://github.com/mishushakov/llm-scraper

  7. DevDocs

    Completely free, private, UI based Tech Documentation MCP server. Designed for coders and software developers in mind. Easily integrate into Cursor, Windsurf, Cline, Roo Code, Claude Desktop App (by cyberagiinc)

    Project mention: Show HN: We made an MCP Server so that Cursor can build anything from API Docs | news.ycombinator.com | 2025-03-24

    Looks cool, the only one similar I've seen so far that is similar is: https://github.com/cyberagiinc/DevDocs

    But every-time I've tried to run DevDocs, I've had issues running it. Either the scraper or the MCP server fails to run.

  8. api.consumet.org

    A Modern Search Engine API for Anime, Movies/TVShows, Books, Light Novels, Manga, etc.

  9. Civic Auth

    Web2 & Web3 login in a simple SDK. Drop Civic Auth into your app with native TS/JS support. Email login, SSO options, embedded wallets, and full session management. Minimal config. Deploy in under 5 minutes.

    Civic Auth logo
  10. epublifier

    Converts some webnovels to epub format

    Project mention: 聊聊开源 - FAV0周刊#019 | dev.to | 2024-10-27

    将网站转化为Epub

  11. linvo-scraper

    Linkedin Automation Bot with every possible scraping! Valid for 2022 used by Linvo.io

  12. HLTV

    The unofficial HLTV Node.js API

  13. mwoffliner

    MediaWiki scraper: all your wiki articles in one highly compressed ZIM file

    Project mention: Internet in a Box | news.ycombinator.com | 2025-04-27

    Volunteer for Kiwix here (https://kiwix.org), we do a lot of offline Wikipedia stuff. I've personally worked on MWOffliner (https://github.com/openzim/mwoffliner) which scrapes MediaWikis, primarily Wikipedia.

    We have apps for basically every platform. Our PWA even supports IE 11!

    You can use the WP1 tool which I'm the primary maintainer of (https://wp1.openzim.org/#/selections/user) to create "selections" which let you have your own custom version of Wikipedia, using categories that you define, WikiProjects, or even custom SPARQL queries.

  14. mkfd

    RSS feed builder created with Bun🥖 and Hono🔥- builds from webpages, email folders, and REST API calls.

    Project mention: Mkfd – RSS feed builder API created with Bun and Hono | news.ycombinator.com | 2024-11-17
  15. scraper

    Nodejs web scraper. Contains a command line, docker container, terraform module and ansible roles for distributed cloud scraping. Supported databases: SQLite, MySQL, PostgreSQL. Supported headless clients: Puppeteer, Playwright, Cheerio, JSdom. (by get-set-fetch)

  16. extension

    web scraping extension (by get-set-fetch)

  17. vercel-metafy

    Easily scrape metadata from websites as a service using Vercel.

  18. freenom-auto-renew-domains

    A scraper built with puppeteer that auto renew free domains on Freenom and send discord message using bot

  19. webscraper-bot

    Web scraping Discord bot that notifies if new item appears

  20. botasaurus-starter

    🚀 OFFICIAL STARTER TEMPLATE FOR BOTASAURUS SCRAPING FRAMEWORK 🤖

  21. Philia

    An easy to use imageboard scraper.

  22. passport-appointment-bot

    An automated bot designed to seamlessly book appointments for the renewal or creation of Swedish passports or national ID cards.

  23. scrapyteer

    Web crawling & scraping framework for Node.js on top of headless Chrome browser

  24. forward-proxy-manager

    Request distributor for web scraping

  25. wallace-apple-dictionary

    :book: macOS Dictionary for the readers of "Infinite Jest"

  26. Sevalla

    Deploy and host your apps and databases, now with $50 credit! Sevalla is the PaaS you have been looking for! Advanced deployment pipelines, usage-based pricing, preview apps, templates, human support by developers, and much more!

    Sevalla logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

TypeScript Scraper discussion

Log in or Post with

TypeScript Scraper related posts

  • 👽 Extract Thousands of Rows of Data Without Writing Code (Open Source)

    1 project | dev.to | 17 Jul 2025
  • Why we started sampleapp.ai

    1 project | dev.to | 23 Jun 2025
  • Scraperr – A Self Hosted Webscraper

    6 projects | news.ycombinator.com | 11 May 2025
  • Show HN: Get structured website data with just a prompt

    1 project | news.ycombinator.com | 20 Jan 2025
  • Show HN: Llms.txt Generator – Turn websites into a text file to feed to any LLM

    2 projects | news.ycombinator.com | 21 Nov 2024
  • Maxun: Open-Source No-Code Web Data Extraction Platform

    1 project | news.ycombinator.com | 8 Nov 2024
  • Maxun: Open-Source No-Code Web Data Extraction Platform

    1 project | news.ycombinator.com | 3 Nov 2024
  • A note from our sponsor - SurveyJS
    surveyjs.io | 31 Aug 2025
    Add the SurveyJS white-label form builder to your JavaScript app (React/Angular/Vue3). Build dynamic JSON forms without coding. Fully customizable, works with any backend, perfect for data-heavy apps. Learn more. Learn more →

Index

What are some of the best open-source Scraper projects in TypeScript? This list will help you:

# Project Stars
1 firecrawl 52,769
2 cheerio 29,714
3 crawlee 19,240
4 maxun 13,540
5 llm-scraper 5,974
6 DevDocs 1,868
7 api.consumet.org 1,470
8 epublifier 798
9 linvo-scraper 622
10 HLTV 453
11 mwoffliner 390
12 mkfd 186
13 scraper 114
14 extension 84
15 vercel-metafy 52
16 freenom-auto-renew-domains 50
17 webscraper-bot 29
18 botasaurus-starter 29
19 Philia 24
20 passport-appointment-bot 24
21 scrapyteer 19
22 forward-proxy-manager 13
23 wallace-apple-dictionary 11

Sponsored
JavaScript Form Builder with No-Code UI & Built-In JSON Schema Editor
Add the SurveyJS white-label form builder to your JavaScript app (React/Angular/Vue3). Build dynamic JSON forms without coding. Fully customizable, works with any backend, perfect for data-heavy apps. Learn more.
surveyjs.io

Did you know that TypeScript is
the 1st most popular programming language
based on number of references?