retrokit vs html5ever

retrokit

:joystick: Bring back the old Web(Kit) and make it secure (by tholian-network)

Source Code

Suggest alternative

Edit details

html5ever

High-performance browser-grade HTML5 parser (by servo)

Encoding HTML

Source Code

Suggest alternative

Edit details

Our great sponsors

InfluxDB - Power Real-Time Data Analytics at Scale

WorkOS - The modern identity platform for B2B SaaS

SaaSHub - Software Alternatives and Reviews

Our great sponsors

retrokit		html5ever
	Project
10	Mentions	5
50	Stars	1,983
-	Growth	2.6%
0.0	Activity	7.6
about 2 years ago	Latest Commit	4 days ago
C++	Language	Rust
-	License	GNU General Public License v3.0 or later

The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

retrokit

Posts with mentions or reviews of retrokit. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-09-22.

I'm fed up with it, so I'm writing a browser
12 projects | news.ycombinator.com | 22 Sep 2023

That's what I did [1]
Need contributors and other maintainers though, because keeping up with upstream is impossible as a single dev.
[1] https://github.com/tholian-network/retrokit
The FBI Identified a Tor User
3 projects | news.ycombinator.com | 17 Jan 2023

From a technological point of view, TOR still has a couple of flaws which make it vulnerable to the metadata logging systems of ISPs:
- it needs a trailing non-zero buffer, randomized by the size of the payload, so that stream sizes and durations don't match
- it needs a request scattering feature, so that the requests for a specific website don't get proxied through the same nodes/paths
- it needs a failsafe browser engine, which doesn't give a flying damn about WebRTC and decides to actively drop features.
- it needs to stop monkey-patching out ("stubbing") the APIs that are compromising user privacy, and start removing those features.
I myself started a WebKit fork a while ago but eventually had to give up due to the sheer amount of work required to maintain such an engine project. I called it RetroKit [1], and I documented what kind of features in WebKit were already usable for tracking and had to be removed.
I'm sorry to be blunt here, but all that user privacy valueing electron bullshit that uses embedded chrome in the background doesn't cut it anymore. And neither does Firefox that literally goes rogue in an endless loop of requests when you block their tracking domains. The config settings in Firefox don't change shit anymore, and it will keep requesting the tracking domains. It does it also in Librefox and all the *wolf profile variants, just use a local eBPF firewall to verify. I added my non-complete opensnitch ruleset to my dotfiles for others to try out. [3]
If I would rewrite a browser engine today, I'd probably go for golang. But golang probably makes handling arbitrary network data a huge pain, so it's kinda useless for failsafe html5 parsing.
[1] https://github.com/tholian-network/retrokit
[2] (the browser using retrokit) https://github.com/tholian-network/stealth
[3] https://github.com/cookiengineer/dotfiles/tree/master/softwa...
There are no Internet Browsers that cannot be tracked, or are there?
3 projects | /r/hacking | 17 Sep 2022

I'm trying to go a different route with Stealth, my programmable peer-to-peer web browser that can offload and relay traffic intelligently - and with RetroKit, my WebKit fork that aims to remove all JavaScript APIs that can be used for fingerprinting and/or tracking.
No-JavaScript Fingerprinting
4 projects | news.ycombinator.com | 6 Feb 2022

Note that among a sea of tracked browsers, the untrackable browser shines like a bright star.
Statistical analysis of these values over time (matched with client hints, ETags, If-Modified-Since, and IPs) will make most browsers uniquely identifiable.
If the malicious vendor is good, they even correlate the size and order of requests. Because that's unique as well and can identify TOR browsers pretty easily.
It's like saying "I can't be tracked, because I use Linux". Guess what, as long as nobody in your town uses Linux, you are the most trackable person.
I decided to go with the "behave as the statistical norm expects you to behave" and created my browser/scraper [1] and forked WebKit into a webview [2] that doesn't support anything that can be used for tracking; with the idea that those tracking features can be shimmed and faked.
I personally think this is the only way to be untrackable these days. Because let's be honest, nobody uses Firefox with ETP in my town anymore :(
WebKit was a good start of this because at least some of the features were implemented behind compiler flags...whereas all other browsers and engines can't be built without say, WebRTC support, or say, without Audio Worklets which are for themselves enough to be uniquely identified.
[1] https://github.com/tholian-network/stealth
[2] https://github.com/tholian-network/retrokit
(both WIP)
IndexedDB in Safari 15 leaks your browsing activity in real time
1 project | /r/programming | 16 Jan 2022

Source: I forked WebKit into RetroKit and have been busy removing APIs that could be used as an attack surface. From outdated Netscape Plugin APIs to Java Applets...over Geolocation to even URL-based Hacks in the codebase.
We Have A Browser Monopoly Again and Firefox is The Only Alternative Out There
6 projects | /r/programming | 1 Jan 2022

Here you go, trying to remove all APIs that are unnecessary for a Web View: https://github.com/tholian-network/retrokit
A Minimal GUI browser – FInanced through donations – Actively developed
5 projects | news.ycombinator.com | 30 Dec 2021

> it uses Qt's WebEngine (Chromium)
Came here to post this after taking a look at the source code.
Honestly, I don't think this is what we need. Midori and others already switched to Electron, and we have dozens of Electron GUIs describing themselves as "secure" Web Browsers, even though they just use a element and that's basically it. They don't even care that all their users are fingerprinted and tracked by Google's TURN servers for WebRTC, which are automatically connected-to on every start of the program. I mean, really? You didn't even use a software firewall to check what's going on?
I think that what we need is an alternative that values privacy and security over everything else, without compromising on that. Even the TOR Browser threw their towel in the past, and meanwhile decided to use a script that replaces some APIs in upstream Firefox with stub APIs - instead of removing them from the codebase. If something is added and forgotten to add to this stubbing script, it's an exposed API.
Personally I believe we have to reduce the attack surface of Web Browsers. It's okay to have an Ungoogled Chromium to play your WebGL games occasionally. But do you want it to be able to fingerprint your hardware, and even your network devices? Probably not.
I wish Permission Management and Access to APIs would play a bigger role in the Web Browser market, but most vendors use Privacy more as a marketing thing that has no meaning at all anymore. Firefox fingerprints you by default every time you open the program by default via their shitty geolocation and ocsp services, and the Tracking Prevention basically is useless against fingerprint.js or fingerprint.css or even against HTTP2/HTTP3 fingerprinting through ETag headers. I mean, uBlock does a better job with that; even without the same amount of capabilities.
And Web Extensions can't filter response bodies, and therefore "abuse" injected CORS headers to block the loaded content. Well, at least it worked as long as google decided to not allowlist their own domains, which they now did. (well, additionally to the Manifest V3 shitshow, which I won't dig into)
We desperately need a secure _Web Engine_ alternative that removes all that crap that can be abused for fingerprinting. In regards to opsec we need something like an integration to another Browser a la "Open this in an Incognito Tab with an isolated Browser Session inside /tmp/randomized-profile-1337". The other things won't last, and there's always be bypasses and exploits in the JIT world. All the Cookie Clearing extensions just ain't gonna cut it anymore.
Over the holidays I started to revisit my idea to fork WebKit into something more secure [1], and spent some time in removing all kinds of features from it. I was kind of shocked how many APIs were available that were built with no permission management at all. Things like detecting Airplay-capable devices, hardcoded behaviours for specific domains, bluetooth APIs, payment request APIs that basically get full access to your local keyring, bugs in FTP directory parsers that could be abused to see whether you have working credentials in your keyring, picture in picture APIs that can be easily exploited, media capture APIs that are delegating streams through 3 processes, shared buffers that aren't really implemented and still exposed as an API, preconnect and prerender functionalities that can be used in an endless loop...etc.pp.
From an opsec perspective Web Browsers are a nightmare, and I don't think chromium is any different in that regard.
[1] https://github.com/tholian-network/retrokit
Started a WebKit fork that tries to reduce its Attack Surface
1 project | /r/opensource | 23 Dec 2021
Retro: WebKit fork for high-security environments (without any potential Tracking or Fingerprinting APIs)
1 project | /r/privacy | 22 Dec 2021
Show HN: WebKit Fork that aims to remove all Privacy compromising APIs
1 project | news.ycombinator.com | 22 Dec 2021

html5ever

Posts with mentions or reviews of html5ever. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-09-22.

I'm fed up with it, so I'm writing a browser
12 projects | news.ycombinator.com | 22 Sep 2023

Would you consider using some libraries in your project? There are lots of good ones in the Rust ecosystem, and many of them are not part of any existing browsers.
For example:
- https://github.com/servo/html5ever (HTML parsing - note: this is used in Servo)
- https://github.com/parcel-bundler/lightningcss (CSS parsing)
- https://github.com/DioxusLabs/taffy (web layout)
- https://github.com/pop-os/cosmic-text (text layout and rendering)
Obviously you should be free to work on whatever you like, but just as a benchmark on the scope of your project: I spent ~6 months implementing just the CSS Grid algorithm in Taffy last year. An entire browser from literal scratch is probably a 10 year project for one person.
Ask HN: A fast, Rust HTML parser that works?
4 projects | news.ycombinator.com | 23 Feb 2023

So I'm doing some web scraping in Rust, and so I will need to parse HTML. [scraper](https://docs.rs/scraper/latest/scraper/) (which uses [html5ever](https://github.com/servo/html5ever)) is doing fine except that it's the bottleneck of my application.
So I need a faster parser. I've tried [tl](https://docs.rs/tl/latest/tl/) which would've been perfect except that it doesn't actually work on the HTML I have. When I try to `query_selector` the elements I need, it returns nothing.
[Kuchiki](https://docs.rs/kuchiki/latest/kuchiki/) is abandonded.
I couldn't figure out how to get [lol-html](https://github.com/cloudflare/lol-html) to work for me (it's designed for re-writing HTML, whatever that means). It doesn't seem to have an API to extract the inner text of an element.
[html5gum](https://github.com/untitaker/html5gum) seems to be just an HTML tokenizer, or otherwise just too low-level. I have not yet tried [quick-xml](https://github.com/tafia/quick-xml/) but judging from the README, it's pretty low-level too. I mean, if these are the only options left then I will try them. Otherwise, I would love to use a parser that's faster but as ergonomic as `scraper` or `tl`.
At this point, I would be happy with an Lxml bridge/port of some sort. I don't need to mutate HTML, just parse and read data from it.
Any HTML parsing resources without going straight to W3C?
1 project | /r/rust | 31 Aug 2022
I’m developing rust module like google pagespeed nginx module, which will rewrite html for each request it received for dynamic optimisation. what library is fastest to do this? I’m using this now
1 project | /r/rust | 30 Aug 2021
What is the best way to parse HTML tags?
1 project | /r/rust | 15 Jul 2021

See https://github.com/servo/html5ever/tree/master/rcdom for an example implementation to imitate.

What are some alternatives?

When comparing retrokit and html5ever you can also consider the following projects:

cosmic-text - Pure Rust multi-line text handling

rust-htmlescape - A HTML entity encoding library for Rust

dooble - Dooble is a scientific browser. Minimal, cute, unusually stable, and available almost everyware. Completed?

serde - Serialization framework for Rust

blog-nojs-fingerprint-demo - A demo for the no-JavaScript fingerprinting article

byteorder - Rust library for reading/writing numbers in big-endian and little-endian.

lightningcss - An extremely fast CSS parser, transformer, bundler, and minifier written in Rust.

bincode - A binary encoder / decoder implementation in Rust.

stealth - :rocket: Stealth - Secure, Peer-to-Peer, Private and Automateable Web Browser/Scraper/Proxy

tersenet - A new type of JavaScript-free light-weight fast browser built on rst and web assembly. Does not actually exist.

gosub-engine - A html5 tokenizer / parser that hopefully grow up to be a browser. Discussions at https://github.com/gosub-browser/gosub-engine/discussions

rust-bencode - Implementation of Bencode encoding written in rust