xmltodict
selectolax
Our great sponsors
xmltodict | selectolax | |
---|---|---|
7 | 4 | |
5,022 | 729 | |
- | - | |
0.0 | 6.9 | |
4 months ago | 29 days ago | |
Python | Cython | |
MIT License | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
xmltodict
-
Top python libraries/ frameworks that you suggest every one
Nope, sorry, it's just an XML generator. The Python stdlib offers https://docs.python.org/3/library/xml.etree.elementtree.html and PyPI offers https://github.com/martinblech/xmltodict for parsing, and you could write CSV with csvwriter or pandas.
-
Like JQ, but for HTML
xmlstarlet is really nothing like jq, as a language. But yes, I use it because it is the best commandline xml processor I'd found. That's the only similarity to jq.
Is this the yq? https://kislyuk.github.io/yq/ It does contain an 'xq', as a literal wrapper for jq, piping output into it after transcoding XML to JSON using xmltodict https://github.com/martinblech/xmltodict (which explodes xml into separate JSON data structures).
This is a bash one-liner! But TBF it really is a 'jq for xml'. I think it would be horrible for some things, but you could also do a lot of useful things painlessly.
selectolax
-
The State of Web Scraping in 2021
Lazyweb link: https://github.com/rushter/selectolax
although I don't follow the need to have what appears to be two completely separate HTML parsing C libraries as dependencies; seeing this in the readme for Modest gives me the shivers because lxml has _seen some shit_
> Modest is a fast HTML renderer implemented as a pure C99 library with no outside dependencies.
although its other dep seems much more cognizant about the HTML5 standard, for whatever that's worth: https://github.com/lexbor/lexbor#lexbor
---
> It looks like the author of the article just googled some libraries for each language and didn't research the topic
Heh, oh, new to the Internet, are you? :-D
What are some alternatives?
lxml - The lxml XML toolkit for Python
untangle - Converts XML to Python objects
MarkupSafe - Safely add untrusted strings to HTML/XML markup.
pyquery - A jquery-like library for python
html5lib - Standards-compliant library for parsing and serializing HTML documents and fragments in Python
lexbor - Lexbor is development of an open source HTML Renderer library. http://lexbor.com
gazpacho - 🥫 The simple, fast, and modern web scraping library
bleach - Bleach is an allowed-list-based HTML sanitizing library that escapes or strips markup and attributes
xmldataset - xmldataset: xml parsing made easy 🗃️
pyppeteer - Headless chrome/chromium automation library (unofficial port of puppeteer)