Like JQ, but for HTML

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • xidel

    Command line tool to download and extract data from HTML/XML pages or JSON-APIs, using CSS, XPath 3.0, XQuery 3.0, JSONiq or pattern matching. It can also create new or transformed XML/HTML/JSON documents.

  • > Well, jq is grep as well as sed and awk, but yeah, htmlq seems to be just grep, for sake of comparison.

    Exactly, and that is what I mean. If you want to compare, compare it with grep, not jq.

    Someone else posted xidel[0] in this thread, which I've not used, but it seems to be the "jq but for html".

    [0] https://github.com/benibela/xidel

  • pup

    Parsing HTML at the command line

  • Once upon a time I was using pup[0] for such thing as well as later I changed to cascadia[1] which seemed much more advanced.

    Comparing the two repos, it seems pup's development has somewhat died down.

    These tools, including htmlq, seem to sell themselves as "jq for html", which is far from the truth. Jq is closer to the awk where you can do just about everything. Cascadia, htmlq, and pup seem closer to grep for html. They can essentially only select data from a html source.

    [0] https://github.com/EricChiang/pup

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • rust

    Empowering everyone to build reliable and efficient software.

  • This is very nice!

    For reasoning about tree-based data such as HTML, I also highly recommend the declarative programming language Prolog. For instance, here is the sample query from the README, fetching all elements with id get-help from https://www.rust-lang.org, using Scryer Prolog and its SGML and HTTP libraries in combination with the XPath-inspired query language from library(xpath):

        ?- http_open("https://www.rust-lang.org", Stream, []),

  • htmlq

    Like jq, but for HTML.

  • gron

    Make JSON greppable!

  • cascadia

    Go cascadia package command line CSS selector

  • tq

    Perform a lookup by CSS selector on an HTML input

  • It did write it a few years ago.

    https://github.com/plainas/tq

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  • blog.rust-lang.org

    Home of the Rust and Inside Rust blogs

  • ['/', '/tools/install', '/learn', 'https://play.rust-lang.org/', '/tools', '/governance', '/community', 'https://blog.rust-lang.org/',...

  • JsonPath

    Java JsonPath implementation

  • is anyone else using the https://github.com/json-path/JsonPath over the jq route?

    I hope we standardize on some jq query language, like we have with a base set of SQL syntax

  • jsoup

    jsoup: the Java HTML parser, built for HTML editing, cleaning, scraping, and XSS safety.

  • https://jsoup.org/ has been around for a long time and seems a bit more mature & maintained than this two-code-files 2-year-old repo. Highly recommend.

  • lol-html

    Low output latency streaming HTML parser/rewriter with CSS selector-based API

  • I’d like to see a tool using lol-html [0] and their CSS selector API as a streaming HTML editor.

    [0] https://github.com/cloudflare/lol-html

  • xmlq

    filter xml in the command line with xpath

  • xmltodict

    Python module that makes working with XML feel like you are working with JSON

  • xmlstarlet is really nothing like jq, as a language. But yes, I use it because it is the best commandline xml processor I'd found. That's the only similarity to jq.

    Is this the yq? https://kislyuk.github.io/yq/ It does contain an 'xq', as a literal wrapper for jq, piping output into it after transcoding XML to JSON using xmltodict https://github.com/martinblech/xmltodict (which explodes xml into separate JSON data structures).

    This is a bash one-liner! But TBF it really is a 'jq for xml'. I think it would be horrible for some things, but you could also do a lot of useful things painlessly.

  • yq

    Command-line YAML, XML, TOML processor - jq wrapper for YAML/XML/TOML documents (by kislyuk)

  • xmlstarlet is really nothing like jq, as a language. But yes, I use it because it is the best commandline xml processor I'd found. That's the only similarity to jq.

    Is this the yq? https://kislyuk.github.io/yq/ It does contain an 'xq', as a literal wrapper for jq, piping output into it after transcoding XML to JSON using xmltodict https://github.com/martinblech/xmltodict (which explodes xml into separate JSON data structures).

    This is a bash one-liner! But TBF it really is a 'jq for xml'. I think it would be horrible for some things, but you could also do a lot of useful things painlessly.

  • hq

    lightweight command line HTML processor using CSS and XPath selectors

  • > Software definition through a reference to another software is somewhat confusing.

    Possibly, depending on background as you note, but not all promotion is intended at the same audience. When submitting to HN, "like jq, but for X" is short and conveys what it is to most the people that would care, I think. jq has been submitted and talked about here many times with lively discussion over the years.[1] At this point I think most those that are interested in what that is and what this is will understand fairly quickly from the title. Those that don't might be missed, or they might look it up like you, or they might see it through some other submission some other time with a different title which isn't based on a chain of references.

    1: https://hn.algolia.com/?dateRange=all&page=0&prefix=false&qu...

  • tools

    all-in collection of productivity scripts, CLI tools, utility libraries, fuse filesystems, and also some stuff (by bAndie91)

  • parsel[0] is a python script in front of the identically named python lib, and extracts parts of the HTML by CSS selector. the advantage of it compared to most similar tools is that you can navigate in the DOM tree up and down to find precisely what you want if the HTML is poorly marked up, or the searched parts are not close to each other.

    [0] https://github.com/bAndie91/tools/blob/master/usr/bin/parsel

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • Measuring startup and shutdown overhead of several code interpreters

    2 projects | dev.to | 17 Apr 2024
  • Faster tetranucleotide (k-mer) frequencies!

    4 projects | dev.to | 15 Mar 2024
  • Argc: Easily create feature-rich CLIs in bash

    1 project | news.ycombinator.com | 4 Mar 2024
  • Hyperfine: A command-line benchmarking tool

    2 projects | news.ycombinator.com | 6 Feb 2024
  • Show HN: Muse, a CLI background music player

    5 projects | news.ycombinator.com | 17 Jan 2024