xmltodict vs lol-html

xmltodict

Python module that makes working with XML feel like you are working with JSON (by martinblech)

HTML Manipulation

Source Code

Suggest alternative

Edit details

lol-html

Low output latency streaming HTML parser/rewriter with CSS selector-based API (by cloudflare)

HTML css-selectors Parser Rewriting Streaming Stream Rust

Source Code

crates.io

Suggest alternative

Edit details

Our great sponsors

InfluxDB - Power Real-Time Data Analytics at Scale

WorkOS - The modern identity platform for B2B SaaS

SaaSHub - Software Alternatives and Reviews

Our great sponsors

xmltodict		lol-html
	Project
7	Mentions	8
5,370	Stars	1,388
-	Growth	1.7%
0.6	Activity	5.7
3 months ago	Latest Commit	about 1 month ago
Python	Language	Rust
MIT License	License	BSD 3-clause "New" or "Revised" License

The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

xmltodict

Posts with mentions or reviews of xmltodict. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2022-03-28.

XML to CSV or JSON using Cloud Function
1 project | /r/googlecloud | 14 Dec 2022

Your Cloud Function would be written in Node.js, Python, Go, Java, C#, Ruby, or PHP; pick the one you're most comfortable with. It would get the name and bucket of the newly uploaded XML file as an input parameter. It would then load the file and call a library that makes the conversion. Example libraries: xml-js (for Node), xmltodict (for Python).
Did I reinvent a wheel?
1 project | /r/learnpython | 14 May 2022

Go with xmltodict. Works pretty fine, and you just have to drop any key begining with @ or # (if there is not already an option for that).
Top python libraries/ frameworks that you suggest every one
15 projects | /r/Python | 28 Mar 2022

Nope, sorry, it's just an XML generator. The Python stdlib offers https://docs.python.org/3/library/xml.etree.elementtree.html and PyPI offers https://github.com/martinblech/xmltodict for parsing, and you could write CSV with csvwriter or pandas.
Dict or List to store table like data
1 project | /r/learnpython | 26 Nov 2021
Like JQ, but for HTML
21 projects | news.ycombinator.com | 7 Sep 2021

xmlstarlet is really nothing like jq, as a language. But yes, I use it because it is the best commandline xml processor I'd found. That's the only similarity to jq.
Is this the yq? https://kislyuk.github.io/yq/ It does contain an 'xq', as a literal wrapper for jq, piping output into it after transcoding XML to JSON using xmltodict https://github.com/martinblech/xmltodict (which explodes xml into separate JSON data structures).
This is a bash one-liner! But TBF it really is a 'jq for xml'. I think it would be horrible for some things, but you could also do a lot of useful things painlessly.
Parsing unknown XML file with Python?
1 project | /r/learnpython | 6 Feb 2021
I used raw data from my watch (and Python) to make a map of all the NH48 hikes from this year. I hiked Liberty and Flume before I got the watch in June, so I need to do those again! Color-coded by altitude.
1 project | /r/wmnf | 5 Jan 2021

Super-easy, take a look at xmltodict https://github.com/martinblech/xmltodict xmltodict.parse(xml_str) gets you a dictionary

lol-html

Posts with mentions or reviews of lol-html. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-02-23.

Ask HN: A fast, Rust HTML parser that works?
4 projects | news.ycombinator.com | 23 Feb 2023

So I'm doing some web scraping in Rust, and so I will need to parse HTML. [scraper](https://docs.rs/scraper/latest/scraper/) (which uses [html5ever](https://github.com/servo/html5ever)) is doing fine except that it's the bottleneck of my application.
So I need a faster parser. I've tried [tl](https://docs.rs/tl/latest/tl/) which would've been perfect except that it doesn't actually work on the HTML I have. When I try to `query_selector` the elements I need, it returns nothing.
[Kuchiki](https://docs.rs/kuchiki/latest/kuchiki/) is abandonded.
I couldn't figure out how to get [lol-html](https://github.com/cloudflare/lol-html) to work for me (it's designed for re-writing HTML, whatever that means). It doesn't seem to have an API to extract the inner text of an element.
[html5gum](https://github.com/untitaker/html5gum) seems to be just an HTML tokenizer, or otherwise just too low-level. I have not yet tried [quick-xml](https://github.com/tafia/quick-xml/) but judging from the README, it's pretty low-level too. I mean, if these are the only options left then I will try them. Otherwise, I would love to use a parser that's faster but as ergonomic as `scraper` or `tl`.
At this point, I would be happy with an Lxml bridge/port of some sort. I don't need to mutate HTML, just parse and read data from it.
How much Rust work is actually going on at Cloudflare?
2 projects | /r/rust | 15 Jan 2023

I'm also in the Workers org but I have had a bit of interaction with Rust. There's some Rust in the Workers runtime using lol-html for HTMLRewriter as well as some tooling and there's the full blown workers-rs framework that I work on, but that's about it for the Rust I work on regularly.
Is there a library for manipulating HTML?
3 projects | /r/rust | 17 Dec 2022
pup: Parsing HTML at the Command Line
7 projects | news.ycombinator.com | 30 Nov 2022
Texting Robots: Taming robots.txt with Rust and 34 million tests
4 projects | /r/rust | 28 Mar 2022

Thanks again and happy to answer any questions! My current unreleased Rust projects include a web crawler that uses Tokio + Tokio Console + Reqwest with this crate for robots.txt and a fast text extraction library using lol-html that I am planning to sprinkle with some minimal ML to get Readability.js style intelligent extraction (with training in Python). See Fathom for an example of the ML approach I'll likely take.
Like JQ, but for HTML
21 projects | news.ycombinator.com | 7 Sep 2021

I’d like to see a tool using lol-html [0] and their CSS selector API as a streaming HTML editor.
[0] https://github.com/cloudflare/lol-html
Things you can’t do in Rust (and what to do instead)
6 projects | news.ycombinator.com | 15 May 2021
Problems with building a backend app in Rust in 2020
2 projects | /r/rust | 21 Dec 2020

Cloudflare has open sourced lol-html, a "Low output latency streaming HTML parser/rewriter with CSS selector-based API". Is that what you are looking for?

What are some alternatives?

When comparing xmltodict and lol-html you can also consider the following projects:

lxml - The lxml XML toolkit for Python

actor-rust-scraper - Experimental scraper in Rust suited for running locally or on the Apify platform. Inspired by Apify SDK.

untangle - Converts XML to Python objects

tq - Perform a lookup by CSS selector on an HTML input

MarkupSafe - Safely add untrusted strings to HTML/XML markup.

yq - Command-line YAML, XML, TOML processor - jq wrapper for YAML/XML/TOML documents

pyquery - A jquery-like library for python

tools - all-in collection of productivity scripts, CLI tools, utility libraries, fuse filesystems, and also some stuff

xhtml2pdf - A library for converting HTML into PDFs using ReportLab

hq - lightweight command line HTML processor using CSS and XPath selectors

xmldataset - xmldataset: xml parsing made easy 🗃️

cargo-expand - Subcommand to show result of macro expansion

xmltodict vs lxml lol-html vs actor-rust-scraper xmltodict vs untangle lol-html vs tq xmltodict vs MarkupSafe lol-html vs yq xmltodict vs pyquery lol-html vs tools xmltodict vs xhtml2pdf lol-html vs hq xmltodict vs xmldataset lol-html vs cargo-expand

Compare xmltodict vs lol-html and see what are their differences.

xmltodict

lol-html

xmltodict

lol-html

What are some alternatives?