Like JQ, but for HTML

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

xidel

18 653 5.6 Pascal

Command line tool to download and extract data from HTML/XML pages or JSON-APIs, using CSS, XPath 3.0, XQuery 3.0, JSONiq or pattern matching. It can also create new or transformed XML/HTML/JSON documents.

> Well, jq is grep as well as sed and awk, but yeah, htmlq seems to be just grep, for sake of comparison.
Exactly, and that is what I mean. If you want to compare, compare it with grep, not jq.
Someone else posted xidel[0] in this thread, which I've not used, but it seems to be the "jq but for html".
[0] https://github.com/benibela/xidel

pup

52 8,000 0.0 HTML

Parsing HTML at the command line

Once upon a time I was using pup[0] for such thing as well as later I changed to cascadia[1] which seemed much more advanced.
Comparing the two repos, it seems pup's development has somewhat died down.
These tools, including htmlq, seem to sell themselves as "jq for html", which is far from the truth. Jq is closer to the awk where you can do just about everything. Cascadia, htmlq, and pup seem closer to grep for html. They can essentially only select data from a html source.
[0] https://github.com/EricChiang/pup

InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
rust

2,686 93,266 10.0 Rust

Empowering everyone to build reliable and efficient software.

This is very nice!
For reasoning about tree-based data such as HTML, I also highly recommend the declarative programming language Prolog. For instance, here is the sample query from the README, fetching all elements with id get-help from https://www.rust-lang.org, using Scryer Prolog and its SGML and HTTP libraries in combination with the XPath-inspired query language from library(xpath):
    ?- http_open("https://www.rust-lang.org", Stream, []),

htmlq

27 6,942 2.3 Rust

Like jq, but for HTML.
gron

64 13,550 0.0 Go

Make JSON greppable!
cascadia

1 134 4.7 Go

Go cascadia package command line CSS selector
tq

1 234 0.0 Python

Perform a lookup by CSS selector on an HTML input

It did write it a few years ago.
https://github.com/plainas/tq

SaaSHub

www.saashub.com featured

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
blog.rust-lang.org

25 332 9.5 HTML

Home of the Rust and Inside Rust blogs

['/', '/tools/install', '/learn', 'https://play.rust-lang.org/', '/tools', '/governance', '/community', 'https://blog.rust-lang.org/',...

JsonPath

10 8,668 6.3 Java

Java JsonPath implementation

is anyone else using the https://github.com/json-path/JsonPath over the jq route?
I hope we standardize on some jq query language, like we have with a base set of SQL syntax

jsoup

27 10,661 9.1 Java

jsoup: the Java HTML parser, built for HTML editing, cleaning, scraping, and XSS safety.

https://jsoup.org/ has been around for a long time and seems a bit more mature & maintained than this two-code-files 2-year-old repo. Highly recommend.

lol-html

8 1,400 5.7 Rust

Low output latency streaming HTML parser/rewriter with CSS selector-based API

I’d like to see a tool using lol-html [0] and their CSS selector API as a streaming HTML editor.
[0] https://github.com/cloudflare/lol-html

xmlq

1 5 0.0 JavaScript

filter xml in the command line with xpath
xmltodict

7 5,386 3.1 Python

Python module that makes working with XML feel like you are working with JSON

xmlstarlet is really nothing like jq, as a language. But yes, I use it because it is the best commandline xml processor I'd found. That's the only similarity to jq.
Is this the yq? https://kislyuk.github.io/yq/ It does contain an 'xq', as a literal wrapper for jq, piping output into it after transcoding XML to JSON using xmltodict https://github.com/martinblech/xmltodict (which explodes xml into separate JSON data structures).
This is a bash one-liner! But TBF it really is a 'jq for xml'. I think it would be horrible for some things, but you could also do a lot of useful things painlessly.

yq

24 2,475 7.7 Python

Command-line YAML, XML, TOML processor - jq wrapper for YAML/XML/TOML documents (by kislyuk)

xmlstarlet is really nothing like jq, as a language. But yes, I use it because it is the best commandline xml processor I'd found. That's the only similarity to jq.
Is this the yq? https://kislyuk.github.io/yq/ It does contain an 'xq', as a literal wrapper for jq, piping output into it after transcoding XML to JSON using xmltodict https://github.com/martinblech/xmltodict (which explodes xml into separate JSON data structures).
This is a bash one-liner! But TBF it really is a 'jq for xml'. I think it would be horrible for some things, but you could also do a lot of useful things painlessly.

hq

8 64 5.2 Shell

lightweight command line HTML processor using CSS and XPath selectors
hn-search

1,637 524 2.9 TypeScript

Hacker News Search

> Software definition through a reference to another software is somewhat confusing.
Possibly, depending on background as you note, but not all promotion is intended at the same audience. When submitting to HN, "like jq, but for X" is short and conveys what it is to most the people that would care, I think. jq has been submitted and talked about here many times with lively discussion over the years.[1] At this point I think most those that are interested in what that is and what this is will understand fairly quickly from the title. Those that don't might be missed, or they might look it up like you, or they might see it through some other submission some other time with a different title which isn't based on a chain of references.
1: https://hn.algolia.com/?dateRange=all&page=0&prefix=false&qu...

tools

2 16 9.5 Shell

all-in collection of productivity scripts, CLI tools, utility libraries, fuse filesystems, and also some stuff (by bAndie91)

parsel[0] is a python script in front of the identically named python lib, and extracts parts of the HTML by CSS selector. the advantage of it compared to most similar tools is that you can navigate in the DOM tree up and down to find precisely what you want if the HTML is poorly marked up, or the searched parts are not close to each other.
[0] https://github.com/bAndie91/tools/blob/master/usr/bin/parsel

SaaSHub

www.saashub.com featured

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Measuring startup and shutdown overhead of several code interpreters

2 projects | dev.to | 17 Apr 2024
Faster tetranucleotide (k-mer) frequencies!

4 projects | dev.to | 15 Mar 2024
Argc: Easily create feature-rich CLIs in bash

1 project | news.ycombinator.com | 4 Mar 2024
Hyperfine: A command-line benchmarking tool

2 projects | news.ycombinator.com | 6 Feb 2024
Show HN: Muse, a CLI background music player

5 projects | news.ycombinator.com | 17 Jan 2024

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
Command-line Rust HTML css-selector CLI
Post date: 7 Sep 2021

xidel

pup

InfluxDB

rust

htmlq

gron

cascadia

tq

SaaSHub

blog.rust-lang.org

JsonPath

jsoup

lol-html

xmlq

xmltodict

yq

hq

hn-search

tools

SaaSHub

Related posts

Measuring startup and shutdown overhead of several code interpreters

Faster tetranucleotide (k-mer) frequencies!

Argc: Easily create feature-rich CLIs in bash

Hyperfine: A command-line benchmarking tool

Show HN: Muse, a CLI background music player