Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →
Lol-html Alternatives
Similar projects and alternatives to lol-html
-
actor-rust-scraper
Experimental scraper in Rust suited for running locally or on the Apify platform. Inspired by Apify SDK.
-
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
yq
Command-line YAML, XML, TOML processor - jq wrapper for YAML/XML/TOML documents (by kislyuk)
-
xidel
Command line tool to download and extract data from HTML/XML pages or JSON-APIs, using CSS, XPath 3.0, XQuery 3.0, JSONiq or pattern matching. It can also create new or transformed XML/HTML/JSON documents.
-
-
tools
all-in collection of productivity scripts, CLI tools, utility libraries, fuse filesystems, and also some stuff (by bAndie91)
-
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
-
-
-
-
bbob
⚡️Blazing fast js bbcode parser, that transforms and parses bbcode to AST with plugin support in pure javascript, no dependencies
-
-
-
-
-
sccache
Sccache is a ccache-like tool. It is used as a compiler wrapper and avoids compilation when possible. Sccache has the capability to utilize caching in remote storage environments, including various cloud storage options, or alternatively, in local storage.
-
-
-
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
lol-html reviews and mentions
-
Ask HN: A fast, Rust HTML parser that works?
So I'm doing some web scraping in Rust, and so I will need to parse HTML. [scraper](https://docs.rs/scraper/latest/scraper/) (which uses [html5ever](https://github.com/servo/html5ever)) is doing fine except that it's the bottleneck of my application.
So I need a faster parser. I've tried [tl](https://docs.rs/tl/latest/tl/) which would've been perfect except that it doesn't actually work on the HTML I have. When I try to `query_selector` the elements I need, it returns nothing.
[Kuchiki](https://docs.rs/kuchiki/latest/kuchiki/) is abandonded.
I couldn't figure out how to get [lol-html](https://github.com/cloudflare/lol-html) to work for me (it's designed for re-writing HTML, whatever that means). It doesn't seem to have an API to extract the inner text of an element.
[html5gum](https://github.com/untitaker/html5gum) seems to be just an HTML tokenizer, or otherwise just too low-level. I have not yet tried [quick-xml](https://github.com/tafia/quick-xml/) but judging from the README, it's pretty low-level too. I mean, if these are the only options left then I will try them. Otherwise, I would love to use a parser that's faster but as ergonomic as `scraper` or `tl`.
At this point, I would be happy with an Lxml bridge/port of some sort. I don't need to mutate HTML, just parse and read data from it.
-
How much Rust work is actually going on at Cloudflare?
I'm also in the Workers org but I have had a bit of interaction with Rust. There's some Rust in the Workers runtime using lol-html for HTMLRewriter as well as some tooling and there's the full blown workers-rs framework that I work on, but that's about it for the Rust I work on regularly.
- Is there a library for manipulating HTML?
- pup: Parsing HTML at the Command Line
-
Texting Robots: Taming robots.txt with Rust and 34 million tests
Thanks again and happy to answer any questions! My current unreleased Rust projects include a web crawler that uses Tokio + Tokio Console + Reqwest with this crate for robots.txt and a fast text extraction library using lol-html that I am planning to sprinkle with some minimal ML to get Readability.js style intelligent extraction (with training in Python). See Fathom for an example of the ML approach I'll likely take.
-
Like JQ, but for HTML
I’d like to see a tool using lol-html [0] and their CSS selector API as a streaming HTML editor.
- Things you can’t do in Rust (and what to do instead)
-
Problems with building a backend app in Rust in 2020
Cloudflare has open sourced lol-html, a "Low output latency streaming HTML parser/rewriter with CSS selector-based API". Is that what you are looking for?
-
A note from our sponsor - InfluxDB
www.influxdata.com | 28 Mar 2024
Stats
cloudflare/lol-html is an open source project licensed under BSD 3-clause "New" or "Revised" License which is an OSI approved license.
The primary programming language of lol-html is Rust.