Xerces-C++
Apache Xerces-C validating XML parser (by apache)
gumbo-parser
An HTML5 parsing library in pure C99 (by google)
Our great sponsors
Xerces-C++ | gumbo-parser | |
---|---|---|
1 | 7 | |
112 | 5,116 | |
2.7% | - | |
5.5 | 0.0 | |
16 days ago | about 1 year ago | |
C++ | HTML | |
Apache License 2.0 | Apache License 2.0 |
The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
Xerces-C++
Posts with mentions or reviews of Xerces-C++.
We have used some of these posts to build our list of alternatives
and similar projects. The last one was on 2021-05-14.
-
What second language to learn after Python?
Well, regarding HTML5, what I've found was libxml (does not support tag-soup HTML5), https://github.com/lexbor/lexbor, for which I was unable to find good documentation ( see https://lexbor.com/docs/lexbor/#dom), Apache Xerces (appears to not support tag-soup HTML5 as well), and Gumbo, which does not appear to be active and to support selectors and XPath (although there are libraries that add that).
gumbo-parser
Posts with mentions or reviews of gumbo-parser.
We have used some of these posts to build our list of alternatives
and similar projects. The last one was on 2023-03-31.
- Gumbo HTML5 parsing library has been discontinued
-
Web Scraping with C++
It uses libcurl and gumbo (https://github.com/google/gumbo-parser). Gumbo is apparently written in pure C99 (interestingly Curl is written in the even older C89 standard). Will've been more amusing if article was written considering that and used C99.
- how to make a C++ web scraper?
-
The computers are fast, but you don't know it
> A standards compliant HTML5 parser is at the bare minimum millions of lines of code.
But https://github.com/google/gumbo-parser is only 34K lines?
-
Markup Language Operations in Nim to extract and remove el
oops... I saw a markup parser and automatically thought XML, but you are right! HTML is actually a whole different beast!
As it turns out, seems like nim also has an html parser [1], but I'm guessing something like Google's gumbo [2] could be more reliable, but you would have to write bindings for nim.
1: https://nim-lang.org/docs/htmlparser.html
2: https://github.com/google/gumbo-parser
-
What second language to learn after Python?
Well, regarding HTML5, what I've found was libxml (does not support tag-soup HTML5), https://github.com/lexbor/lexbor, for which I was unable to find good documentation ( see https://lexbor.com/docs/lexbor/#dom), Apache Xerces (appears to not support tag-soup HTML5 as well), and Gumbo, which does not appear to be active and to support selectors and XPath (although there are libraries that add that).
-
Does anyone know of an HTML parser written in C++ that has Node.js interface?
I haven't used any of them, but there's a few wrappers available for Gumbo.
What are some alternatives?
When comparing Xerces-C++ and gumbo-parser you can also consider the following projects:
Libxml2 - Read-only mirror of https://gitlab.gnome.org/GNOME/libxml2
lexbor - Lexbor is development of an open source HTML Renderer library. https://lexbor.com
Expat - The Expat XML Parser
HTML-XML-Operations-Nim - Mark Up Language extraction, removal and copy