Like JQ, but for HTML

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • InfluxDB - Build time-series-based applications quickly and at scale.
  • Sonar - Write Clean Python Code. Always.
  • SaaSHub - Software Alternatives and Reviews
  • xidel

    Command line tool to download and extract data from HTML/XML pages or JSON-APIs, using CSS, XPath 3.0, XQuery 3.0, JSONiq or pattern matching. It can also create new or transformed XML/HTML/JSON documents.

    > Well, jq is grep as well as sed and awk, but yeah, htmlq seems to be just grep, for sake of comparison.

    Exactly, and that is what I mean. If you want to compare, compare it with grep, not jq.

    Someone else posted xidel[0] in this thread, which I've not used, but it seems to be the "jq but for html".

    [0] https://github.com/benibela/xidel

  • pup

    Parsing HTML at the command line

    Once upon a time I was using pup[0] for such thing as well as later I changed to cascadia[1] which seemed much more advanced.

    Comparing the two repos, it seems pup's development has somewhat died down.

    These tools, including htmlq, seem to sell themselves as "jq for html", which is far from the truth. Jq is closer to the awk where you can do just about everything. Cascadia, htmlq, and pup seem closer to grep for html. They can essentially only select data from a html source.

    [0] https://github.com/EricChiang/pup

  • InfluxDB

    Build time-series-based applications quickly and at scale.. InfluxDB is the Time Series Platform where developers build real-time applications for analytics, IoT and cloud-native services. Easy to start, it is available in the cloud or on-premises.

  • rust

    Empowering everyone to build reliable and efficient software.

    This is very nice!

    For reasoning about tree-based data such as HTML, I also highly recommend the declarative programming language Prolog. For instance, here is the sample query from the README, fetching all elements with id get-help from https://www.rust-lang.org, using Scryer Prolog and its SGML and HTTP libraries in combination with the XPath-inspired query language from library(xpath):

        ?- http_open("https://www.rust-lang.org", Stream, []),

  • htmlq

    Like jq, but for HTML.

  • gron

    Make JSON greppable!

  • cascadia

    Go cascadia package command line CSS selector

  • tq

    Perform a lookup by CSS selector on an HTML input

    It did write it a few years ago.

    https://github.com/plainas/tq

  • Sonar

    Write Clean Python Code. Always.. Sonar helps you commit clean code every time. With over 225 unique rules to find Python bugs, code smells & vulnerabilities, Sonar finds the issues while you focus on the work.

  • blog.rust-lang.org

    The Rust Programming Language Blog

    ['/', '/tools/install', '/learn', 'https://play.rust-lang.org/', '/tools', '/governance', '/community', 'https://blog.rust-lang.org/',...

  • JsonPath

    Java JsonPath implementation

    is anyone else using the https://github.com/json-path/JsonPath over the jq route?

    I hope we standardize on some jq query language, like we have with a base set of SQL syntax

  • jsoup

    jsoup: the Java HTML parser, built for HTML editing, cleaning, scraping, and XSS safety.

    https://jsoup.org/ has been around for a long time and seems a bit more mature & maintained than this two-code-files 2-year-old repo. Highly recommend.

  • lol-html

    Low output latency streaming HTML parser/rewriter with CSS selector-based API

    I’d like to see a tool using lol-html [0] and their CSS selector API as a streaming HTML editor.

    [0] https://github.com/cloudflare/lol-html

  • xmlq

    filter xml in the command line with xpath

  • xmltodict

    Python module that makes working with XML feel like you are working with JSON

    xmlstarlet is really nothing like jq, as a language. But yes, I use it because it is the best commandline xml processor I'd found. That's the only similarity to jq.

    Is this the yq? https://kislyuk.github.io/yq/ It does contain an 'xq', as a literal wrapper for jq, piping output into it after transcoding XML to JSON using xmltodict https://github.com/martinblech/xmltodict (which explodes xml into separate JSON data structures).

    This is a bash one-liner! But TBF it really is a 'jq for xml'. I think it would be horrible for some things, but you could also do a lot of useful things painlessly.

  • yq

    Command-line YAML, XML, TOML processor - jq wrapper for YAML/XML/TOML documents (by kislyuk)

    xmlstarlet is really nothing like jq, as a language. But yes, I use it because it is the best commandline xml processor I'd found. That's the only similarity to jq.

    Is this the yq? https://kislyuk.github.io/yq/ It does contain an 'xq', as a literal wrapper for jq, piping output into it after transcoding XML to JSON using xmltodict https://github.com/martinblech/xmltodict (which explodes xml into separate JSON data structures).

    This is a bash one-liner! But TBF it really is a 'jq for xml'. I think it would be horrible for some things, but you could also do a lot of useful things painlessly.

  • hq

    lightweight command line HTML processor using CSS and XPath selectors

  • tools

    various linux scripts (by bAndie91)

    parsel[0] is a python script in front of the identically named python lib, and extracts parts of the HTML by CSS selector. the advantage of it compared to most similar tools is that you can navigate in the DOM tree up and down to find precisely what you want if the HTML is poorly marked up, or the searched parts are not close to each other.

    [0] https://github.com/bAndie91/tools/blob/master/usr/bin/parsel

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts