Top 23 Xpath Open-Source Projects

jsoup

27 10,606 9.1 Java

jsoup: the Java HTML parser, built for HTML editing, cleaning, scraping, and XSS safety.

Project mention: FLaNK Stack Weekly for 20 June 2023 | dev.to | 2023-06-20

PugiXML

5 3,802 7.6 C++

Light-weight, simple and fast XML parser for C++ with XPath support

Project mention: [Cpp Questions] Un analyseur HTML pour CPP? | /r/enfrancais | 2023-05-17

et déjà essayé: pugixml

InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
Ono

0 2,599 0.0 Objective-C

A sensible way to deal with XML & HTML for iOS & macOS
HtmlAgilityPack

28 2,545 7.5 C#

Html Agility Pack (HAP) is a free and open-source HTML parser written in C# to read/write DOM and supports plain XPATH or XSLT. It is a .NET code library that allows you to parse "out of the web" HTML files.

Project mention: Script invoking an Online Port Scan of your external IP, to test your firewall and port forwarder. | /r/PowerShell | 2023-07-06

Pretty Straighforward. It uses an online port scanner , in this case https://www.speedguide.net/portscan.php parses the replies using HtmlAgilityPack .

DiDOM

0 2,173 0.6 PHP

Simple and fast HTML and XML parser
parsel

5 1,074 6.4 Python

Parsel lets you extract data from XML/HTML documents using XPath or CSS selectors
Fuzi

1 1,057 0.0 Swift

A fast & lightweight XML & HTML parser in Swift with XPath & CSS support
WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
xq

6 745 8.0 Go

Command-line XML and HTML beautifier and content extractor

Project mention: Build an Open Source Project: Behind the Scenes | dev.to | 2023-07-02

Some time ago, I started a project called "xq", which is a command-line XML and HTML beautifier and content extractor written in Go. Using this project as an example, I want to show what I did to make it a little bit more discoverable and usable by other people.

htmlquery

3 696 5.2 Go

htmlquery is golang XPath package for HTML query.
xidel

18 650 5.9 Pascal

Command line tool to download and extract data from HTML/XML pages or JSON-APIs, using CSS, XPath 3.0, XQuery 3.0, JSONiq or pattern matching. It can also create new or transformed XML/HTML/JSON documents.

Project mention: Move over jq I found something easier: fx | news.ycombinator.com | 2023-06-06

You could try Xidel[1]. It supports JSON, XML and HTML using XPath/XQuery 3.1
It has some extensions to the standard that are pretty nice (JSONiq, CSS selectors, html “template” matching), but you can limit it to just standard XPath/XQuery if you like.
I recommend getting the nightly v .99 build if you give it a try, the stable .98 version is pretty old and I’ve had no issues with .99
1. https://www.videlibri.de/xidel.html

xpath

1 649 6.8 Go

XPath package for Golang, supports HTML, XML, JSON document query.
camaro

3 547 6.2 JavaScript

camaro is an utility to transform XML to JSON, using Node.js binding to native XML parser pugixml, one of the fastest XML parser around.

Project mention: Using XPath in 2023 | news.ycombinator.com | 2023-07-16

back in the day where every OTA (online travel agent) and airlines use XML for their API, we had to integrate them in an API gateway where to unify their API schema and workflow.
we wrote a small package[1] (using pugixml) to transform XML to JSON using a custom Xpath template syntax. Make our job much easier.
[1]: https://github.com/tuananh/camaro

dude

28 412 9.0 Python

dude uncomplicated data extraction: A simple framework for writing web scrapers using Python decorators

Project mention: Webscraping beginner here ready to start leveling up to intermediate. Looking for some good webscraping repositories (e.g any of your GitHub repos/projects) that I can use as learning tools, and general recommendations for what to do next | /r/webscraping | 2023-05-08

Please check https://github.com/roniemartinez/dude

eXist

0 408 9.7 Java

eXist Native XML Database and Application Platform
xmlquery

1 402 5.2 Go

xmlquery is Golang XPath package for XML query.
sweet_xml

0 353 4.4 Elixir
ftr-site-config

13 348 9.5

Site-specific article extraction rules to aid content extractors, feed readers, and 'read later' applications.

Project mention: can someone suggest a good rss reader for android please? | /r/rss | 2023-07-12

As far as full-text caching... maybe a self-hosted instance or paid version of the FiveFilters Full-Text RSS service would work. You can integrate that into whatever aggregator you want.

Meeseeks

0 308 3.1 Elixir

An Elixir library for parsing and extracting data from HTML and XML with CSS or XPath selectors.
jsonquery

1 240 5.3 Go

JSON xpath query for Go. Golang XPath query for JSON query.
nokolexbor

4 153 6.0 Ruby

High-performance HTML5 parser for Ruby based on Lexbor, with support for both CSS selectors and XPath.

Project mention: Ruby 3.3's YJIT: Faster While Using Less Memory | news.ycombinator.com | 2023-12-18

Yes, we ended up replacing Nokogiri by Nokolexbor, our own port of lexbor parser with like almost full compatibility with Nokogiri APIs while being around 5x faster: https://github.com/serpapi/nokolexbor

fs2-data

2 140 9.1 Scala

streaming data parsing and transformation library
fontoxpath

2 125 6.8 TypeScript

A minimalistic XPath 3.1 implementation in pure JavaScript

Project mention: Using XPath in 2023 | news.ycombinator.com | 2023-07-16

Not XPath, but for folks interested in querying (rather than walking) syntax trees for arbitrary nodes, this is also a cool feature of tree-sitter[1]. It uses a scheme-like syntax, and it’s impressively efficient.
And in terms of XPath, for folks using a JS stack, fontoxpath[2] supports a DOM facade adapter interface which allows for querying any arbitrary tree-like structure, so it could certainly handle the same use case.
1: https://tree-sitter.github.io/tree-sitter/using-parsers#patt...
2: https://github.com/FontoXML/fontoxpath

internettools

1 115 7.7 Pascal

XPath/XQuery 3.1 interpreter for Pascal with compatibility modes for XPath 2.0/XQuery 1.0/3.0, custom and JSONiq extensions, pattern matching, XML/HTML/JSON parsers and classes for HTTP/S requests
SaaSHub

www.saashub.com sponsored

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2023-12-18.

Xpath related posts

Xsel: A XPath 1.0 Go library/CLI that can query XML, HTML, and JSON documents
2 projects | news.ycombinator.com | 18 Aug 2023
Using XPath in 2023
8 projects | news.ycombinator.com | 16 Jul 2023
can someone suggest a good rss reader for android please?
2 projects | /r/rss | 12 Jul 2023
Script invoking an Online Port Scan of your external IP, to test your firewall and port forwarder.
2 projects | /r/PowerShell | 6 Jul 2023
Script to test the state of certain ports on your firewall from the outside
2 projects | /r/PowerShell | 5 Jul 2023
Fast XML to JSON using xpath templates in node
1 project | news.ycombinator.com | 2 Jul 2023
Copy Pasting Email Content Issue
2 projects | /r/csharp | 8 Jun 2023
A note from our sponsor - InfluxDB
www.influxdata.com | 19 Apr 2024

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →

Index

What are some of the best open-source Xpath projects? This list will help you:

	Project	Stars
1	jsoup	10,606
2	PugiXML	3,802
3	Ono	2,599
4	HtmlAgilityPack	2,545
5	DiDOM	2,173
6	parsel	1,074
7	Fuzi	1,057
8	xq	745
9	htmlquery	696
10	xidel	650
11	xpath	649
12	camaro	547
13	dude	412
14	eXist	408
15	xmlquery	402
16	sweet_xml	353
17	ftr-site-config	348
18	Meeseeks	308
19	jsonquery	240
20	nokolexbor	153
21	fs2-data	140
22	fontoxpath	125
23	internettools	115