html5ever
lightningcss
html5ever | lightningcss | |
---|---|---|
5 | 11 | |
1,987 | 5,966 | |
1.3% | 2.0% | |
7.6 | 8.7 | |
10 days ago | 4 days ago | |
Rust | Rust | |
GNU General Public License v3.0 or later | Mozilla Public License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
html5ever
-
I'm fed up with it, so I'm writing a browser
Would you consider using some libraries in your project? There are lots of good ones in the Rust ecosystem, and many of them are not part of any existing browsers.
For example:
- https://github.com/servo/html5ever (HTML parsing - note: this is used in Servo)
- https://github.com/parcel-bundler/lightningcss (CSS parsing)
- https://github.com/DioxusLabs/taffy (web layout)
- https://github.com/pop-os/cosmic-text (text layout and rendering)
Obviously you should be free to work on whatever you like, but just as a benchmark on the scope of your project: I spent ~6 months implementing just the CSS Grid algorithm in Taffy last year. An entire browser from literal scratch is probably a 10 year project for one person.
-
Ask HN: A fast, Rust HTML parser that works?
So I'm doing some web scraping in Rust, and so I will need to parse HTML. [scraper](https://docs.rs/scraper/latest/scraper/) (which uses [html5ever](https://github.com/servo/html5ever)) is doing fine except that it's the bottleneck of my application.
So I need a faster parser. I've tried [tl](https://docs.rs/tl/latest/tl/) which would've been perfect except that it doesn't actually work on the HTML I have. When I try to `query_selector` the elements I need, it returns nothing.
[Kuchiki](https://docs.rs/kuchiki/latest/kuchiki/) is abandonded.
I couldn't figure out how to get [lol-html](https://github.com/cloudflare/lol-html) to work for me (it's designed for re-writing HTML, whatever that means). It doesn't seem to have an API to extract the inner text of an element.
[html5gum](https://github.com/untitaker/html5gum) seems to be just an HTML tokenizer, or otherwise just too low-level. I have not yet tried [quick-xml](https://github.com/tafia/quick-xml/) but judging from the README, it's pretty low-level too. I mean, if these are the only options left then I will try them. Otherwise, I would love to use a parser that's faster but as ergonomic as `scraper` or `tl`.
At this point, I would be happy with an Lxml bridge/port of some sort. I don't need to mutate HTML, just parse and read data from it.
- Any HTML parsing resources without going straight to W3C?
- I’m developing rust module like google pagespeed nginx module, which will rewrite html for each request it received for dynamic optimisation. what library is fastest to do this? I’m using this now
-
What is the best way to parse HTML tags?
See https://github.com/servo/html5ever/tree/master/rcdom for an example implementation to imitate.
lightningcss
-
I'm fed up with it, so I'm writing a browser
Would you consider using some libraries in your project? There are lots of good ones in the Rust ecosystem, and many of them are not part of any existing browsers.
For example:
- https://github.com/servo/html5ever (HTML parsing - note: this is used in Servo)
- https://github.com/parcel-bundler/lightningcss (CSS parsing)
- https://github.com/DioxusLabs/taffy (web layout)
- https://github.com/pop-os/cosmic-text (text layout and rendering)
Obviously you should be free to work on whatever you like, but just as a benchmark on the scope of your project: I spent ~6 months implementing just the CSS Grid algorithm in Taffy last year. An entire browser from literal scratch is probably a 10 year project for one person.
- LightningCSS Benchmark
-
We're building a browser when it's supposed to be impossible
Libraries for a lot of this stuff exist (albeit in many cases not very mature yet):
- https://github.com/pop-os/cosmic-text does text layout (which Taffy explicitly considers out of scope)
- https://github.com/AccessKit/accesskit does accessibility
- https://github.com/servo/rust-cssparser does value-agnostic CSS parsing (it will parse the general syntax but leaves value parsing up to the user, meaning you can easily add support for whatever properties you what). Libraries like https://github.com/parcel-bundler/lightningcss implement parsing for the standard css properties.
- There are crates like https://github.com/BurntSushi/bstr and https://docs.rs/wtf8/latest/wtf8/ for working with non-unicode text
We are planning to add a C API to Taffy, but tbh I feel like C is not very good for this kind of modularised approach. You really want to be able to expose complex APIs with enforced type safety and this isn't possible with C.
-
Help with "returns a value referencing data owned by the current function"
Background: I encountered this problem using lightningcss.
-
On Using Rust in Parcel and Vitest
You can do it - that's actually exactly what my project is doing. I have a single repository with a Rust project, that builds the .wasm file (+ .d.ts + .js) using wasm-pack, and a Node.js project, that uses this .wasm file. There's no problem in packing that and exposing as a npm package. See parcel-bundler/lightningcss for a full blown example (it's not using wasm-pack but builds the Rust project directly).
- An fast CSS parser, transformer, bundler, and minifier written in Rust
- Parcel-Css - A CSS parser, transformer, and minifier written in Rust.
- ParcelCSS – A CSS parser, transformer, and minifier written in Rust
-
Parcel CSS: A new CSS parser, compiler, and minifier
Initial commit, 9 Oct 2021. That is pretty new.
What are some alternatives?
rust-htmlescape - A HTML entity encoding library for Rust
PostCSS - Transforming styles with JS plugins
serde - Serialization framework for Rust
swc - Rust-based platform for the Web
byteorder - Rust library for reading/writing numbers in big-endian and little-endian.
rust-cssparser - Rust implementation of CSS Syntax Level 3
retrokit - :joystick: Bring back the old Web(Kit) and make it secure
parse5 - HTML parsing/serialization toolset for Node.js. WHATWG HTML Living Standard (aka HTML5)-compliant.
bincode - A binary encoder / decoder implementation in Rust.
x-ray - The next web scraper. See through the <html> noise.
tersenet - A new type of JavaScript-free light-weight fast browser built on rst and web assembly. Does not actually exist.
excel-stream