lexbor
linux
Our great sponsors
lexbor | linux | |
---|---|---|
10 | 980 | |
881 | 169,627 | |
1.7% | - | |
8.5 | 10.0 | |
6 days ago | 4 days ago | |
C | C | |
Apache License 2.0 | GNU General Public License v3.0 or later |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
lexbor
-
Modest: A fast HTML renderer implemented as a pure C99 library
Project is deprecated in favour of the same developer's lexbor project[0].
-
Created a performance-focused HTML5 parser for Ruby, trying to be API-compatible with Nokogiri
It supports both CSS selectors and XPath like Nokogiri, but with separate engines - parsing and CSS engine by Lexbor, XPath engine by libxml2. (Nokogiri internally converts CSS selectors to XPath syntax, and uses XPath engine for all searches).
- Lexbor: Fast HTML Renderer library in C
-
Andreas Kling (of SerenityOS fame) is building a new Linux browser using SerenityOS libraries
An HTML parser, probably the simplest relatively modern example I could find is 1MB https://github.com/lexbor/lexbor (haven't used it, but might look more into it now that I know it exists.)
- Lexbor: Open-source HTML Renderer library in C
-
The State of Web Scraping in 2021
Lazyweb link: https://github.com/rushter/selectolax
although I don't follow the need to have what appears to be two completely separate HTML parsing C libraries as dependencies; seeing this in the readme for Modest gives me the shivers because lxml has _seen some shit_
> Modest is a fast HTML renderer implemented as a pure C99 library with no outside dependencies.
although its other dep seems much more cognizant about the HTML5 standard, for whatever that's worth: https://github.com/lexbor/lexbor#lexbor
---
> It looks like the author of the article just googled some libraries for each language and didn't research the topic
Heh, oh, new to the Internet, are you? :-D
-
Libraries for retrivieng html data from website
Lexbor is here: https://github.com/lexbor/lexbor
-
What second language to learn after Python?
Well, regarding HTML5, what I've found was libxml (does not support tag-soup HTML5), https://github.com/lexbor/lexbor, for which I was unable to find good documentation ( see https://lexbor.com/docs/lexbor/#dom), Apache Xerces (appears to not support tag-soup HTML5 as well), and Gumbo, which does not appear to be active and to support selectors and XPath (although there are libraries that add that).
-
You can't parse [X]HTML with regex
I think we've all (mostly?) tried it. It really is the Wild West of the web when you're trying to parse other people's HTML, though.
I've played around with this parser which is extremely quick. https://github.com/lexbor/lexbor
-
How SerpApi sped up data extraction from HTML from 3s to 800ms (or How to profile and optimize Ruby code and C extension)
Iām glad to have the opportunity to contribute to an open-source project that is used by thousands of people. Hopefully, we will speed up Nokogiri (or XML parser it uses) to match the performance of html5ever or lexbor at some point in the future. 800 ms to extract data from HTML is still too much.
linux
-
Linus Torvalds adds arbitrary tabs to kernel code
These are a bit easier to see what's going on:
https://github.com/torvalds/linux/commit/d5cf50dafc9dd5faa1e...
https://github.com/torvalds/linux/blob/d5cf50dafc9dd5faa1e61...
Unfortunately Github doesn't have a way to render symbols for whitespace, but you can tell by selecting the spaces that the previous version had leading tabs. Linus changed it so that the tokens `default` and the number e.g. `12` are also separated by a tab. This is tricky, because the token "default" is seven characters, it will always give this added tab a width of 1 char which makes it always layout the same as if it were a space no matter if you use tab widths of 1, 2, 4, or 8.
- Show HN: Running TempleOS in user space without virtualization
-
PfSense Software Embraces Change: A Strategic Migration to the Linux Kernel
There was also a Gentoo effort to run atop FreeBSD[0]. The challenge of course is that afaik none of the BSD kernel ABIs are considered stable. The stable interface is the BSD libc. That said, with binfmt_misc, I don't see a reason you couldn't just run (at least some) FreeBSD binaries on Linux with a thin syscall translation layer (rather something like qemu-system) and then your layer hooked via binfmt_misc. I'm not aware of anyone who has done this for FreeBSD, but prior efforts existed as alternate binfmts for SysVr4/5 ELF binaries[2]. Either way would take some elbow grease, but you *might* even be able just reuse binfmt_elf and just have a new interpreter for FreeBSD elf.
[0] https://wiki.gentoo.org/wiki/Gentoo_FreeBSD
[1] https://docs.kernel.org/admin-guide/binfmt-misc.html
[2] https://github.com/torvalds/linux/blob/master/fs/binfmt_elf....
-
Improvements to static analysis in GCC 14
> The original less-than check was deemed incorrect
It was only deemed incorrect because of an information leak. Not because it's a valid use-case for user space to copy smaller portions of *hwrpb into user space. https://github.com/torvalds/linux/commit/21c5977a836e399fc71...
- Linus Torvalds accepts a merge commit to the Linux kernel
-
TinyMCE (also) moving from MIT to GPL
Correct. And the combined work needs to carry the MIT license text and copyright attributions for the MIT software authors. With binary distribution it must also be overt, not hidden in some source code drop, but directly accompanying the binary.
Many people who talk about relicensing never credit the MIT developers or distribute the MIT license text. "Because it's GPL now."
I don't think that you believe that, but many developers do.
Some don't see the need for source code scans for Open Source compliance, because the license.txt says GPL, so it's GPL. Prime example is the Linux kernel. There is code under different licenses in there, but people don't even read https://github.com/torvalds/linux/blob/master/COPYING till the end ("In addition, other licenses may also apply.") and conclude it's simply GPL 2 and nothing else.
Also be aware that sublicensing is not the same as relicensing.
-
Linus Torvalds is looking for a more modern GUI editor
> Does he have something against it?
He notoriously hates GNU Emacs, yes.
https://marc.info/?m=122955159617722
https://github.com/torvalds/linux/blob/master/Documentation/...
-
The Linux Kernel Prepares for Rust 1.77 Upgrade
So If we would only count code and not comments, it is only 9489 LoC Rust. Which would be about 0.03% and if we take all lines and not only LoC it would be around 0.05%
[0] https://github.com/XAMPPRocky/tokei
[1] https://github.com/torvalds/linux/commit/b401b621758e46812da...
-
Proposed Windows NT sync driver brings big Wine/Proton performance improvements
AIUI fsync is built on futex_waitv which has been upstreamed. So this has to be more than that.
https://github.com/torvalds/linux/commit/a0eb2da92b715d0c97b...
-
Tell HN: GitHub no longer readable without JavaScript
git clone --no-checkout --depth 1 https://github.com/torvalds/linux.git $dir
What are some alternatives?
myhtml - Fast C/C++ HTML 5 Parser. Using threads.
zen-kernel - Zen Patched Kernel Sources
selectolax - Python binding to Modest and Lexbor engines (fast HTML5 parser with CSS selectors).
DS4Windows - Like those other ds4tools, but sexier
gumbo-parser - An HTML5 parsing library in pure C99
winapps - Run Windows apps such as Microsoft Office/Adobe in Linux (Ubuntu/Fedora) and GNOME/KDE as if they were a part of the native OS, including Nautilus integration.
Xerces-C++ - Apache Xerces-C validating XML parser
Open and cheap DIY IP-KVM based on Raspberry Pi - Open and inexpensive DIY IP-KVM based on Raspberry Pi
nokogiri-rust - Ruby FFI wrapper around scraper crate to be used instead of Nokogiri. Status: proof of concept.
serenity - The Serenity Operating System š
pyppeteer - Headless chrome/chromium automation library (unofficial port of puppeteer)
DsHidMini - Virtual HID Mini-user-mode-driver for Sony DualShock 3 Controllers