dosage
xidel
dosage | xidel | |
---|---|---|
1 | 18 | |
0 | 652 | |
- | - | |
0.0 | 5.6 | |
almost 5 years ago | 27 days ago | |
Python | Pascal | |
MIT License | GNU General Public License v3.0 only |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
dosage
-
Search web freely from command line?
What fundamentally is an HTML parser? I read Regex is not sufficient to parse HTML. What basic design feature does an HTML parser have that allows this? I mean, what procedures does it use to recognize content between tags, for example? For one thing, tags in HTML can nest. That's enough to make regular expressions insufficient to deal with it. Now, most regex engines can do more than what regular expressions are supposed to be, but the regex syntax itself is both unfriendly to this sort of thing, and complicated regexes can really slow down the engine. For another thing, the HTML you can find in the wild is exceptionally far from being well-behaved. Leaving aside the mess that has accumulated over the decades, just HTML5, for example, allows for tags to stand on their own. That is, a closing is not required, as it's implicitly added by the parser when encountering the next block-level element. If you are dealing with anything of the sort, a regex-based solution will get exceptionally messy, exceptionally quick. What you can use them for is chunking the input - as a lexer - or extracting snippets of text out of the page. Even then, I'd rather write b.select(".search-results > div > a:nth-of-type(1)") than something like s/]*class="[^"]*search-results[^"]*"[^>]*>/; s/]*>]*href="([^"]+)"[^>]*>([^<]+).*/ (not an exaggeration): the latter is not only much more complex, but also frail. You say bash is by its nature inadequate for this. How come? Does it not have good string handling and processing capabilities, for example? Nope. Bash is first and foremost a shell. It being a general-purpose programming language comes after, and it shows: it doesn't just make simple things complicated, it's also full of pitfalls (e.g., " vs ', how they behave with variable and tilde expansion, splitting of parameters, $@ vs $*...). Even just "bash scripting" is ambiguous as hell, because there are subtle and less-than-subtle differences between bash, sh and zsh, between different versions and implementations of those binaries. I'm no slouch when it comes to shell-wrangling - I like a challenge - but I still shellcheck my work, which doesn't stop me from running into issues when using scripts I wrote on a different computer. If I decided to use a real HTML parser, say one in Python, I want to find a way to pass the output directly, after executing curl, without having to save a file. Pass the output to what? You can just do whatever - print the search results? - from inside Python itself, and you can always write a shell wrapper if you want to do something with the output. I'm currently researching how to access the last output of a bash command, after it's been executed. I don't believe it's saved anywhere. A lot of commands can fill up the scrollback of your terminal, or just never end. Trying to grab the output of the last command would mean trying to grab the output of those commands, which would just pointlessly fill your disc.
xidel
-
Move over jq I found something easier: fx
You could try Xidel[1]. It supports JSON, XML and HTML using XPath/XQuery 3.1
It has some extensions to the standard that are pretty nice (JSONiq, CSS selectors, html “template” matching), but you can limit it to just standard XPath/XQuery if you like.
I recommend getting the nightly v .99 build if you give it a try, the stable .98 version is pretty old and I’ve had no issues with .99
1. https://www.videlibri.de/xidel.html
-
Batch Win Installer - from a defined list of software, BWI will install software on 64 bit Windows 10/11 x64 machine without prompts ; check what software is installed and offer to install and/or upgrade software and scan program's websites to determine the latest version of the software available
Windows binary of Xidel (https://github.com/benibela/xidel) a commandline tool to download and extract data from HTML pages
- pup: Parsing HTML at the Command Line
- Remove white spaces from last column of CSV
-
What's the best tool to build pipelines from REST APIs?
Xidel for extraction and pagination
-
What are your coolest tools for one-liners ?
Download an entire subreddit to JSON Lines with Xidel:
-
Fetch data from XML
I believe that you can't read from a url with batch alone. you'll need a helper app like xidel https://www.videlibri.de/xidel.html
-
Tutorial: Rapid Script Development with Bash, JC, and JQ (no grep/sed/awk)
I have not played with this, but it looks like xidel might allow you to do this in Bash. jc also has a URL string parser that could be used in such a script.
- Xidel. A tool to query data from anything on the web and extract what you want
-
How to make http request with curl on certain page after being authenticated?
I built Xidel for such authenticated requests:
What are some alternatives?
jq - Command-line JSON processor [Moved to: https://github.com/jqlang/jq]
tools - all-in collection of productivity scripts, CLI tools, utility libraries, fuse filesystems, and also some stuff
pup - Parsing HTML at the command line
gron - Make JSON greppable!
yq - Command-line YAML, XML, TOML processor - jq wrapper for YAML/XML/TOML documents
xmltodict - Python module that makes working with XML feel like you are working with JSON
xmlq - filter xml in the command line with xpath
blog.rust-lang.org - Home of the Rust and Inside Rust blogs
JsonPath - Java JsonPath implementation
jsoup - jsoup: the Java HTML parser, built for HTML editing, cleaning, scraping, and XSS safety.
hn-search - Hacker News Search
jq - Command-line JSON processor