gumbo-parser
benchmarks
gumbo-parser | benchmarks | |
---|---|---|
7 | 40 | |
5,116 | 2,743 | |
- | - | |
0.0 | 7.2 | |
about 1 year ago | 3 months ago | |
HTML | Makefile | |
Apache License 2.0 | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
gumbo-parser
- Gumbo HTML5 parsing library has been discontinued
-
Web Scraping with C++
It uses libcurl and gumbo (https://github.com/google/gumbo-parser). Gumbo is apparently written in pure C99 (interestingly Curl is written in the even older C89 standard). Will've been more amusing if article was written considering that and used C99.
- how to make a C++ web scraper?
-
The computers are fast, but you don't know it
> A standards compliant HTML5 parser is at the bare minimum millions of lines of code.
But https://github.com/google/gumbo-parser is only 34K lines?
-
Markup Language Operations in Nim to extract and remove el
oops... I saw a markup parser and automatically thought XML, but you are right! HTML is actually a whole different beast!
As it turns out, seems like nim also has an html parser [1], but I'm guessing something like Google's gumbo [2] could be more reliable, but you would have to write bindings for nim.
1: https://nim-lang.org/docs/htmlparser.html
2: https://github.com/google/gumbo-parser
-
What second language to learn after Python?
Well, regarding HTML5, what I've found was libxml (does not support tag-soup HTML5), https://github.com/lexbor/lexbor, for which I was unable to find good documentation ( see https://lexbor.com/docs/lexbor/#dom), Apache Xerces (appears to not support tag-soup HTML5 as well), and Gumbo, which does not appear to be active and to support selectors and XPath (although there are libraries that add that).
-
Does anyone know of an HTML parser written in C++ that has Node.js interface?
I haven't used any of them, but there's a few wrappers available for Gumbo.
benchmarks
- Some Benchmarks of Different Languages
- Building a high performance JSON parser
- Top 5 Fastest Programming Languages
- Twitter (re)Releases Recommendation Algorithm on GitHub
-
How green or energy efficient is the Go programming language?
GitHub - kostya/benchmarks: Some benchmarks of different languages
- how to benchmark a programming language
-
Ruby 3.2.0 Is from Another Dimension
In all the language comparisons I've found over the years, Python consistently comes out slightly slower, for example:
https://github.com/kostya/benchmarks
Bearing in mind these are probably not even using YJIT, which makes Ruby considerably faster in some scenarios.
- I made a 88x88 version of the big display image command generator in Python! (will share github link if admins allow it)
-
The original computer languages benchmark is back
Also, here is another benchmark: https://github.com/kostya/benchmarks
- Why does Scala seem to be slow at benchmark results?
What are some alternatives?
Xerces-C++ - Apache Xerces-C validating XML parser
libuv - Cross-platform asynchronous I/O
lexbor - Lexbor is development of an open source HTML Renderer library. https://lexbor.com
lua-languages - Languages that compile to Lua
HTML-XML-Operations-Nim - Mark Up Language extraction, removal and copy
julia - The Julia Programming Language
cpr - C++ Requests: Curl for People, a spiritual port of Python Requests.
beartype - Unbearably fast near-real-time hybrid runtime-static type-checking in pure Python.
q.nim - Query HTML/XML elements using a CSS3 or jQuery-like selector syntax
mypyc - Compile type annotated Python to fast C extensions
html-parser.ts - zero-dependency html parser for node.js and browser that return the dom (tree) structure
Cython - The most widely used Python to C compiler