Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →
TypeScript Htmlparser2 Projects
The fast, flexible, and elegant library for parsing and manipulating HTML and XML.Project mention: Web Scraping in Python – The Complete Guide | news.ycombinator.com | 2024-02-20
> I'm not sure why Python web scraping is so popular compared to Node.js web scraping
Take this with a grain of salt, since I am fully cognizant that I'm the outlier in most of these conversations, but Scrapy is A++ the no-kidding best framework for this activity that has been created thus far. So, if there was scrapyjs maybe I'd look into it, but there's not (that I'm aware of) so here we are. This conversation often comes up in any such "well, I just use requests & ..." conversation and if one is happy with main.py and a bunch of requests invocations, I'm glad for you, but I don't want to try and cobble together all the side-band stuff that Scrapy and its ecosystem provide for me in a reusable and predictable way
Also, often those conversations conflate the server side language with the "scrape using headed browser" language which happens to be the same one. So, if one is using cheerio <https://github.com/cheeriojs/cheerio> then sure node can be a fine thing - if the blog post is all "fire up puppeteer, what can go wrong?!" then there is the road to ruin of doing battle with all kinds of detection problems since it's kind of a browser but kind of not
I, under no circumstances, want the target site running their JS during my crawl runs. I fully accept responsibility for reproducing any XHR or auth or whatever to find the 3 URLs that I care about, without downloading every thumbnail and marketing JS and beacon and and and. I'm also cognizant that my traffic will thus stand out since it uniquely does not make the beacon and marketing calls, but my experience has been that I get the ban hammer less often with my target fetches than trying to pretend to be a browser with a human on the keyboard/mouse but is not
The fast & forgiving HTML and XML parserProject mention: Nue: A React/Vue/Vite/Astro Alternative | news.ycombinator.com | 2023-09-14
I hear you! I went all-in to jQuery- scene. Even wrote a semi-famous library called "jQuery Tools" (oldies know). Then came React and I wrote Riot to simplify the syntax. Then I sidetracked to a startup world for (too) many years and watched aside how the frontend ecosystem grew to it's current dimensions.
Node uses a single dependency, htmlparser2 , in the package.json . The HTML parser is used to traverse the HTML that is written on the Nue files. I quickly _thought_ of writing my own parser, but right now I'm having my eyes staring at Bun's native HTML parsing capabilities. Instead of Node, I'm using Bun to develop everything. I need less dependencies with it, because things like JS minification or .env file parsing are biult in.
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
TypeScript Htmlparser2 related posts
I have an idea for a project and I wanna know which resources are available for me
2 projects | /r/rust | 16 Mar 2023
Why is it so much easier for people/clients to update their socials as opposed to their website? What’s the solution?
2 projects | /r/webdev | 1 Jan 2023
Scraping the web for information. Is this the right approach
1 project | /r/webdev | 10 Dec 2022
How does a Fb, youtube, yt shorts etc downloaders work?
1 project | /r/webdev | 28 Nov 2022
Publish to DokuWiki programmatically without any API
1 project | dev.to | 20 Sep 2022
1 project | dev.to | 18 Sep 2022
Best way to parse and manipulate XML server-side
1 project | /r/sveltejs | 16 Sep 2022
A note from our sponsor - InfluxDB
www.influxdata.com | 21 Feb 2024