html5ever
Servo
html5ever | Servo | |
---|---|---|
5 | 134 | |
1,987 | 26,075 | |
1.3% | 1.0% | |
7.6 | 10.0 | |
11 days ago | 1 day ago | |
Rust | Rust | |
GNU General Public License v3.0 or later | Mozilla Public License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
html5ever
-
I'm fed up with it, so I'm writing a browser
Would you consider using some libraries in your project? There are lots of good ones in the Rust ecosystem, and many of them are not part of any existing browsers.
For example:
- https://github.com/servo/html5ever (HTML parsing - note: this is used in Servo)
- https://github.com/parcel-bundler/lightningcss (CSS parsing)
- https://github.com/DioxusLabs/taffy (web layout)
- https://github.com/pop-os/cosmic-text (text layout and rendering)
Obviously you should be free to work on whatever you like, but just as a benchmark on the scope of your project: I spent ~6 months implementing just the CSS Grid algorithm in Taffy last year. An entire browser from literal scratch is probably a 10 year project for one person.
-
Ask HN: A fast, Rust HTML parser that works?
So I'm doing some web scraping in Rust, and so I will need to parse HTML. [scraper](https://docs.rs/scraper/latest/scraper/) (which uses [html5ever](https://github.com/servo/html5ever)) is doing fine except that it's the bottleneck of my application.
So I need a faster parser. I've tried [tl](https://docs.rs/tl/latest/tl/) which would've been perfect except that it doesn't actually work on the HTML I have. When I try to `query_selector` the elements I need, it returns nothing.
[Kuchiki](https://docs.rs/kuchiki/latest/kuchiki/) is abandonded.
I couldn't figure out how to get [lol-html](https://github.com/cloudflare/lol-html) to work for me (it's designed for re-writing HTML, whatever that means). It doesn't seem to have an API to extract the inner text of an element.
[html5gum](https://github.com/untitaker/html5gum) seems to be just an HTML tokenizer, or otherwise just too low-level. I have not yet tried [quick-xml](https://github.com/tafia/quick-xml/) but judging from the README, it's pretty low-level too. I mean, if these are the only options left then I will try them. Otherwise, I would love to use a parser that's faster but as ergonomic as `scraper` or `tl`.
At this point, I would be happy with an Lxml bridge/port of some sort. I don't need to mutate HTML, just parse and read data from it.
- Any HTML parsing resources without going straight to W3C?
- I’m developing rust module like google pagespeed nginx module, which will rewrite html for each request it received for dynamic optimisation. what library is fastest to do this? I’m using this now
-
What is the best way to parse HTML tags?
See https://github.com/servo/html5ever/tree/master/rcdom for an example implementation to imitate.
Servo
-
GitHub Sponsor the Servo Rust project!
Servo, the embeddable, independent, memory-safe, modular, parallel web rendering engine
- Bringing Exchange Support to Thunderbird
-
CSS for Printing to Paper
> Is there any easy to use/hack HTML layouting engine where I could experiment with custom CSS attributes and bridge that gap? Would anything from Servo be suitable?
Servo could be used for this. You'd want to add support for parsing the CSS properties themselves to the style crate in https://github.com/servo/stylo and then the layout implementation to the layout2020 crate in https://github.com/servo/servo. You do effectively get a whole browser though.
I'm currently working on building a lighter weight / hackable layout engine based on a combination of https://github.com/servo/stylo (for css parsing and selector resolution), https://github.com/DioxusLabs/taffy (for box-level layout) and https://github.com/pop-os/cosmic-text (for flow/inline layout). I expect to have something decent in around 6 months
Neither of these setups currently have any support for pagination though.
-
The Ladybird Browser Project
Great to see some competition still alive in browser engine development. See also Servo (previously part of Mozilla) https://servo.org/ - that and Ladybird are still very underdeveloped compared to every day browsers.
It's a huge shame that there are no nightly builds of ladybird to try out but I assume that's because they just don't want the bug reports (if everything doesn't work it's pointless getting random bugs filed).
-
Mozilla's Abandoned Web Engine 'Servo' Project Is Getting a Well-Deserved Reboot
I haven't messed with it yet but from looking into it, this should absolutely work.
https://github.com/servo/servo/wiki/Building-on-ARM-desktop-...
-
An open-source browser engine written in Rust
don't know, there was a downtime in 2021 and 22 but since 2023, contributions look back to where it was before .. https://github.com/servo/servo/graphs/contributors
-
Modern Java/JVM Build Practices
The world has moved on though to opinionated tools, and Rust isn't even the furthest in that direction (That would be Go). The equivalent of those two lines in Cargo.toml would be this example of a basic configuration from the jacoco-maven-plugin: https://www.jacoco.org/jacoco/trunk/doc/examples/build/pom.x... - That's 40 lines in the section to do the "defaults".
Yes, you could add a load of config for files to include/exclude from coverage and so on, but the idea that that's a norm is way more common in Java projects than other languages. Like here's some example Cargo.toml files from complicated Rust projects:
Servo: https://github.com/servo/servo/blob/main/Cargo.toml
rust-gdext: https://github.com/godot-rust/gdext/blob/master/godot-core/C...
ripgrep: https://github.com/BurntSushi/ripgrep/blob/master/Cargo.toml
socketio: https://github.com/1c3t3a/rust-socketio/blob/main/socketio/C...
-
Top 10 Rusty Repositories for you to start your Open Source Journey
1. Servo
-
❓ Is Google flagging activity from Firefox and targeting uBlock?
It won't don't worry. There already are forks, for the worst case scenario. And Servo is on its way. Not yet ready, but it will be. Originally, from Mozilla kitchen.
-
Populating the page: how browsers work
To pain broad strokes, the layout phase (~= take the HTML, take the CSS, determine the position and size of boxes) is largely sequential in production browser engine today. Selector matching (~= what CSS applies to what element) is parallel in Firefox today, via the Stylo Rust crate originally developed in the research browser engine Servo. Servo can do parallel layout in some capacity (but doesn't implement everything), https://github.com/servo/servo/wiki/Servo-Layout-Engines-Rep... is an interesting and recent document on the matter.
Parallel layout is generally considered to be a complex engineering problem by domain experts.
https://hacks.mozilla.org/2017/08/inside-a-super-fast-css-en... is a really cool article that is related, that is a few years old but what it says is largely correct today.
What are some alternatives?
rust-htmlescape - A HTML entity encoding library for Rust
tauri - Build smaller, faster, and more secure desktop applications with a web frontend.
serde - Serialization framework for Rust
webview - Tiny cross-platform webview library for C/C++. Uses WebKit (GTK/Cocoa) and Edge WebView2 (Windows).
byteorder - Rust library for reading/writing numbers in big-endian and little-endian.
qtwebengine - Qt WebEngine
retrokit - :joystick: Bring back the old Web(Kit) and make it secure
xsv - A fast CSV command line toolkit written in Rust.
bincode - A binary encoder / decoder implementation in Rust.
xi-editor - A modern editor with a backend written in Rust.
tersenet - A new type of JavaScript-free light-weight fast browser built on rst and web assembly. Does not actually exist.
Fractalide - Reusable Reproducible Composable Software