TypeScript Htmlparser2

Open-source TypeScript projects categorized as Htmlparser2

TypeScript Htmlparser2 Projects

  • cheerio

    The fast, flexible, and elegant library for parsing and manipulating HTML and XML.

    Project mention: Web Scraping in Python – The Complete Guide | news.ycombinator.com | 2024-02-20

    > I'm not sure why Python web scraping is so popular compared to Node.js web scraping

    Take this with a grain of salt, since I am fully cognizant that I'm the outlier in most of these conversations, but Scrapy is A++ the no-kidding best framework for this activity that has been created thus far. So, if there was scrapyjs maybe I'd look into it, but there's not (that I'm aware of) so here we are. This conversation often comes up in any such "well, I just use requests & ..." conversation and if one is happy with main.py and a bunch of requests invocations, I'm glad for you, but I don't want to try and cobble together all the side-band stuff that Scrapy and its ecosystem provide for me in a reusable and predictable way

    Also, often those conversations conflate the server side language with the "scrape using headed browser" language which happens to be the same one. So, if one is using cheerio <https://github.com/cheeriojs/cheerio> then sure node can be a fine thing - if the blog post is all "fire up puppeteer, what can go wrong?!" then there is the road to ruin of doing battle with all kinds of detection problems since it's kind of a browser but kind of not

    I, under no circumstances, want the target site running their JS during my crawl runs. I fully accept responsibility for reproducing any XHR or auth or whatever to find the 3 URLs that I care about, without downloading every thumbnail and marketing JS and beacon and and and. I'm also cognizant that my traffic will thus stand out since it uniquely does not make the beacon and marketing calls, but my experience has been that I get the ban hammer less often with my target fetches than trying to pretend to be a browser with a human on the keyboard/mouse but is not

  • htmlparser2

    The fast & forgiving HTML and XML parser

    Project mention: Nue: A React/Vue/Vite/Astro Alternative | news.ycombinator.com | 2023-09-14

    I hear you! I went all-in to jQuery- scene. Even wrote a semi-famous library called "jQuery Tools" (oldies know). Then came React and I wrote Riot to simplify the syntax. Then I sidetracked to a startup world for (too) many years and watched aside how the frontend ecosystem grew to it's current dimensions.

    Node uses a single dependency, htmlparser2 [1], in the package.json [2]. The HTML parser is used to traverse the HTML that is written on the Nue files. I quickly _thought_ of writing my own parser, but right now I'm having my eyes staring at Bun's native HTML parsing capabilities. Instead of Node, I'm using Bun to develop everything. I need less dependencies with it, because things like JS minification or .env file parsing are biult in.

    [1]: https://github.com/fb55/htmlparser2

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2024-02-20.

TypeScript Htmlparser2 related posts

Index

Project Stars
1 cheerio 27,471
2 htmlparser2 4,211
Learn 300+ open source libraries for free using AI.
LearnThisRepo lets you learn 300+ open source repos including Postgres, Langchain, VS Code, and more by chatting with them using AI!
learnthisrepo.com