TypeScript Scraper

Open-source TypeScript projects categorized as Scraper

Top 23 TypeScript Scraper Projects

  • cheerio

    The fast, flexible, and elegant library for parsing and manipulating HTML and XML.

  • Project mention: 8 NPM Packages for JavaScript Beginners [2024][+tutorials] | dev.to | 2024-04-02

    Cheerio is your ticket to the world of server-side magic, allowing you to manipulate HTML and XML documents with jQuery-like syntax. It’s perfect for web scraping, data extraction, or just making sense of the mess that is web content. With Cheerio, you get to play around with the DOM, use CSS selectors, and basically do all the cool things you'd do in the browser, but server-side.

  • crawlee

    Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Puppeteer, Playwright, Cheerio, JSDOM, and raw HTTP. Both headful and headless mode. With proxy rotation.

  • Project mention: How to scrape Amazon products | dev.to | 2024-04-01

    In this guide, we'll be extracting information from Amazon product pages using the power of TypeScript in combination with the Cheerio and Crawlee libraries. We'll explore how to retrieve and extract detailed product data such as titles, prices, image URLs, and more from Amazon's vast marketplace. We'll also discuss handling potential blocking issues that may arise during the scraping process.

  • SurveyJS

    Open-Source JSON Form Builder to Create Dynamic Forms Right in Your App. With SurveyJS form UI libraries, you can build and style forms in a fully-integrated drag & drop form builder, render them in your JS app, and store form submission data in any backend, inc. PHP, ASP.NET Core, and Node.js.

    SurveyJS logo
  • firecrawl

    🔥 Turn entire websites into LLM-ready markdown

  • Project mention: Tutorial: Extracting structured data from websites using Groq and Firecrawl | news.ycombinator.com | 2024-04-22
  • linvo-scraper

    Linkedin Automation Bot with every possible scraping! Valid for 2022 used by Linvo.io

  • HLTV

    The unofficial HLTV Node.js API

  • mwoffliner

    Mediawiki scraper: all your wiki articles in one highly compressed ZIM file

  • Project mention: Wiktionary doesn’t support tables | /r/Kiwix | 2023-07-15

    You can also directly open a ticket at https://github.com/openzim/mwoffliner/issues with as much info as possible so we can look into it (zim name, language, date, article name, etc.)

  • scraper

    Nodejs web scraper. Contains a command line, docker container, terraform module and ansible roles for distributed cloud scraping. Supported databases: SQLite, MySQL, PostgreSQL. Supported headless clients: Puppeteer, Playwright, Cheerio, JSdom. (by get-set-fetch)

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • extension

    web scraping extension (by get-set-fetch)

  • freenom-auto-renew-domains

    A scraper built with puppeteer that auto renew free domains on Freenom and send discord message using bot

  • vercel-metafy

    Easily scrape metadata from websites as a service using Vercel.

  • passport-appointment-bot

    Bot to automatically find and book appointment for renewal/creation of a Swedish passport or national identity card.

  • Philia

    An easy to use imageboard scraper.

  • Project mention: Philia - Imageboard scraper and dataset manager | /r/StableDiffusion | 2023-06-07

    Here's the repo if anyone wants to take a look: https://github.com/MaximumOverflow/Philia-GUI/releases

  • scrapyteer

    Web crawling & scraping framework for Node.js on top of headless Chrome browser

  • Project mention: Low-code Node.js web scraping tool | /r/webscraping | 2023-07-07

    Hi guys, I've created an open-source low-code Node.js web scraping tool on top of the Puppeteer - https://github.com/miroshnikov/scrapyteer. It offers a small set of functions that are combined in pipelines to define a crawling workflow and a shape of output data. Maybe somebody will find it useful.

  • webscraper-bot

    Web scraping Discord bot that notifies if new item appears

  • wallace-apple-dictionary

    :book: macOS Dictionary for the readers of "Infinite Jest"

  • EZAI-Web-Scraper

    An API that allows you to scrape blog posts and articles and get a list of notes or a summary back.

  • scraper

    Declarative web scraper in JavaScript primarily designed to extract linguistics data (by sergeyt)

  • YourArch

    YouTube subtitles scraper/indexer

  • secutils-web-scraper

    The web scrapper component of Secutils.dev

  • Project mention: How to track anything on the internet or use Playwright for fun and profit | dev.to | 2024-01-16

    To begin, all functionality related to browser automation and web scraping lives in a dedicated service — Web Scraper. The primary rationale is that dealing with browsers and arbitrary user scripts is tricky from a security standpoint, and it's always a good idea to isolate such functionality as much as possible. You can read more about the security aspects of web scraping in the "Running web scraping service securely" post.

  • linguabook.github.io

    Just-in-time scraper of linguistic information from different sources like Cambridge, Merriam-Webster. It also has a satellite Chrome Extension to see linguistic information for selected word while you surfing the internet

  • llm-scraper

    Turn any webpage into structured data using LLMs

  • Project mention: Show HN: LLM Scraper – turn any webpage into structured data | news.ycombinator.com | 2024-04-20
  • spinney

    An efficient and flexible web scraper.

  • Favifetch

    A website favicon & icon API, powered by Cloudflare Workers

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

TypeScript Scraper related posts

Index

What are some of the best open-source Scraper projects in TypeScript? This list will help you:

Project Stars
1 cheerio 27,780
2 crawlee 12,129
3 firecrawl 1,659
4 linvo-scraper 589
5 HLTV 374
6 mwoffliner 253
7 scraper 98
8 extension 58
9 freenom-auto-renew-domains 48
10 vercel-metafy 30
11 passport-appointment-bot 24
12 Philia 23
13 scrapyteer 16
14 webscraper-bot 13
15 wallace-apple-dictionary 10
16 EZAI-Web-Scraper 10
17 scraper 3
18 YourArch 2
19 secutils-web-scraper 1
20 linguabook.github.io 1
21 llm-scraper 1
22 spinney 0
23 Favifetch 0

Sponsored
Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com