Top 17 TypeScript Scraping Projects

crawlee

29 12,044 9.8 TypeScript

Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Puppeteer, Playwright, Cheerio, JSDOM, and raw HTTP. Both headful and headless mode. With proxy rotation.

Project mention: How to scrape Amazon products | dev.to | 2024-04-01

In this guide, we'll be extracting information from Amazon product pages using the power of TypeScript in combination with the Cheerio and Crawlee libraries. We'll explore how to retrieve and extract detailed product data such as titles, prices, image URLs, and more from Amazon's vast marketplace. We'll also discuss handling potential blocking issues that may arise during the scraping process.

firecrawl

2 1,659 7.5 TypeScript

🔥 Turn entire websites into LLM-ready markdown

Project mention: Tutorial: Extracting structured data from websites using Groq and Firecrawl | news.ycombinator.com | 2024-04-22

SurveyJS

surveyjs.io sponsored

Open-Source JSON Form Builder to Create Dynamic Forms Right in Your App. With SurveyJS form UI libraries, you can build and style forms in a fully-integrated drag & drop form builder, render them in your JS app, and store form submission data in any backend, inc. PHP, ASP.NET Core, and Node.js.
fingerprint-suite

5 692 9.0 TypeScript

Browser fingerprinting tools for anonymizing your scrapers. Developed by Apify.
secret-agent

1 633 0.0 TypeScript

The web scraper that's nearly impossible to block - now called @ulixee/hero
libremdb

3 257 7.5 TypeScript

A free & open source IMDb front-end.

Project mention: Hebben jullie bedrijven die jullie boycotten? | /r/nederlands | 2023-05-08

facebook-group-members-scraper

1 157 5.4 TypeScript

Facebook Group Members Extractor. Download Facebook group members in CSV.

Project mention: How to quickly grab all members of a AWDTSG Group - Database w/full names, pictures, an ddetails coming soon to know who to avoid | /r/AWDTSGisToxic | 2023-07-01

For those who want to do it for themselves: https://github.com/floriandiud/facebook-group-members-scraper

scraper

12 98 0.0 TypeScript

Nodejs web scraper. Contains a command line, docker container, terraform module and ansible roles for distributed cloud scraping. Supported databases: SQLite, MySQL, PostgreSQL. Supported headless clients: Puppeteer, Playwright, Cheerio, JSdom. (by get-set-fetch)
InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
freenom-auto-renew-domains

1 48 4.6 TypeScript

A scraper built with puppeteer that auto renew free domains on Freenom and send discord message using bot
serpapi-javascript

2 40 6.6 TypeScript

Scrape and parse search engine results using SerpApi.

Project mention: Connect OpenAI with external APIs with Function calling | dev.to | 2023-12-28

Install SerpApi package Don't forget to install the API package you want to use. In this sample, we need to install the serpapi package*:* https://github.com/serpapi/serpapi-javascript

Philia

1 23 6.2 TypeScript

An easy to use imageboard scraper.

Project mention: Philia - Imageboard scraper and dataset manager | /r/StableDiffusion | 2023-06-07

Here's the repo if anyone wants to take a look: https://github.com/MaximumOverflow/Philia-GUI/releases

headless-task-server

1 21 4.8 TypeScript

A headless browser task/job queue & runner based on Hero (Chrome)
timetable-grabber-sit

2 19 4.9 TypeScript

Timetable Grabber - SIT is a tool that allows you to grab and export your trimester's timetable to the .ics format where you can import it to your favourite calendar.
scrapyteer

1 16 4.0 TypeScript

Web crawling & scraping framework for Node.js on top of headless Chrome browser

Project mention: Low-code Node.js web scraping tool | /r/webscraping | 2023-07-07

Hi guys, I've created an open-source low-code Node.js web scraping tool on top of the Puppeteer - https://github.com/miroshnikov/scrapyteer. It offers a small set of functions that are combined in pipelines to define a crawling workflow and a shape of output data. Maybe somebody will find it useful.

game-watch

1 12 6.9 TypeScript

Overview of game release dates, prices and news
dofus-scraper

1 7 0.0 TypeScript

An open-source Dofus encyclopedia scraper.
pattern-grab

2 7 0.0 TypeScript

🤛🏻 Regular Expression Data Grabber
linguabook.github.io

1 1 0.0 TypeScript

Just-in-time scraper of linguistic information from different sources like Cambridge, Merriam-Webster. It also has a satellite Chrome Extension to see linguistic information for selected word while you surfing the internet
WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

TypeScript Scraping related posts

Connect OpenAI with external APIs with Function calling
2 projects | dev.to | 28 Dec 2023
Build and run your Python web scrapers in the cloud with Apify SDK for Python
2 projects | /r/webscraping | 14 Mar 2023
GitHub - le-quentin/free-stock-tickers: Freely fetch live stock data by scraping web pages
1 project | /r/AaScienceRejects | 2 Mar 2023
Free backend app to integrate live stock prices in your spreadsheets
1 project | /r/opensource | 2 Mar 2023
For developers: free open source backend app to fetch stock prices directly in your spreadsheet
1 project | /r/finance | 2 Mar 2023
Web Search and Scrape
3 projects | /r/javahelp | 14 Oct 2022
Crawlee · Build reliable crawlers. Fast. | Crawlee
2 projects | /r/node | 23 Aug 2022
A note from our sponsor - SurveyJS
surveyjs.io | 24 Apr 2024

With SurveyJS form UI libraries, you can build and style forms in a fully-integrated drag & drop form builder, render them in your JS app, and store form submission data in any backend, inc. PHP, ASP.NET Core, and Node.js. Learn more →

Index

What are some of the best open-source Scraping projects in TypeScript? This list will help you:

	Project	Stars
1	crawlee	12,044
2	firecrawl	1,659
3	fingerprint-suite	692
4	secret-agent	633
5	libremdb	257
6	facebook-group-members-scraper	157
7	scraper	98
8	freenom-auto-renew-domains	48
9	serpapi-javascript	40
10	Philia	23
11	headless-task-server	21
12	timetable-grabber-sit	19
13	scrapyteer	16
14	game-watch	12
15	dofus-scraper	7
16	pattern-grab	7
17	linguabook.github.io	1