html5ever
bincode
Our great sponsors
html5ever | bincode | |
---|---|---|
5 | 16 | |
1,983 | 2,523 | |
2.6% | 2.7% | |
7.6 | 6.9 | |
7 days ago | 18 days ago | |
Rust | Rust | |
GNU General Public License v3.0 or later | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
html5ever
-
I'm fed up with it, so I'm writing a browser
Would you consider using some libraries in your project? There are lots of good ones in the Rust ecosystem, and many of them are not part of any existing browsers.
For example:
- https://github.com/servo/html5ever (HTML parsing - note: this is used in Servo)
- https://github.com/parcel-bundler/lightningcss (CSS parsing)
- https://github.com/DioxusLabs/taffy (web layout)
- https://github.com/pop-os/cosmic-text (text layout and rendering)
Obviously you should be free to work on whatever you like, but just as a benchmark on the scope of your project: I spent ~6 months implementing just the CSS Grid algorithm in Taffy last year. An entire browser from literal scratch is probably a 10 year project for one person.
-
Ask HN: A fast, Rust HTML parser that works?
So I'm doing some web scraping in Rust, and so I will need to parse HTML. [scraper](https://docs.rs/scraper/latest/scraper/) (which uses [html5ever](https://github.com/servo/html5ever)) is doing fine except that it's the bottleneck of my application.
So I need a faster parser. I've tried [tl](https://docs.rs/tl/latest/tl/) which would've been perfect except that it doesn't actually work on the HTML I have. When I try to `query_selector` the elements I need, it returns nothing.
[Kuchiki](https://docs.rs/kuchiki/latest/kuchiki/) is abandonded.
I couldn't figure out how to get [lol-html](https://github.com/cloudflare/lol-html) to work for me (it's designed for re-writing HTML, whatever that means). It doesn't seem to have an API to extract the inner text of an element.
[html5gum](https://github.com/untitaker/html5gum) seems to be just an HTML tokenizer, or otherwise just too low-level. I have not yet tried [quick-xml](https://github.com/tafia/quick-xml/) but judging from the README, it's pretty low-level too. I mean, if these are the only options left then I will try them. Otherwise, I would love to use a parser that's faster but as ergonomic as `scraper` or `tl`.
At this point, I would be happy with an Lxml bridge/port of some sort. I don't need to mutate HTML, just parse and read data from it.
- Any HTML parsing resources without going straight to W3C?
- I’m developing rust module like google pagespeed nginx module, which will rewrite html for each request it received for dynamic optimisation. what library is fastest to do this? I’m using this now
-
What is the best way to parse HTML tags?
See https://github.com/servo/html5ever/tree/master/rcdom for an example implementation to imitate.
bincode
-
Hey Rustaceans! Got a question? Ask here (14/2023)!
Ermm... actually I meant something like this: playground, but then I realized it's basically (de)serialization, and I just found that we already have a crate for that: bincode.
-
Convert a base-64 encoded, serialised, Rust struct to a Python class
One, figure out the bincode format (documented here: https://github.com/bincode-org/bincode/blob/trunk/docs/spec.md) and write your own parser. Maybe a one-off that specifically only handles this one data structure would be fairly straightforward.
- Fang, async background processing for Rust
-
impl serde::Deserialize... is it really that complicated?
Step 1: The Deserialize type requests data from the Deserializer with one of the deserialize_type methods. This gives it an opportunity to provide certain metadata about the type: structs provide a list of fields, enums provide a list of variants, tuples provide a length, etc. Some data formats (notably bincode) require this metadata to drive deserializing, as the wire format is not self-describing. Crucially, the Deserialize type also provides a visitor that is capable of receiving the requested data from the Deserializer.
-
A nicer way to pack this message?
Alternatively, give Bincode a try.
-
Hey Rustaceans! Got an easy question? Ask here (9/2022)!
Like separate instructions? I was thinking if a instruction have unknown length I make sure I have some kind of header field that tells the data length of the instruction so receiver knows when next instruction starts. And I was planning on using Bincode with serde to serialize and dezerialize like structs and stuff.
-
Easily converts a struct into Vec<u8> and back.
Isn't this essentially bincode?
-
Does rust have function works like eval?
This is similar in practice to using abi_stable, and end-users will still receive compiled files, but your plugins will be sandboxed and a single build will work on all platforms. The downside is that it's a bit more work because WebAssembly's support for passing complex data types between the host and the WebAssembly code is in the preliminary stages, so you need to do something like using Serde to encode your data into something like Bincode or MessagePack (or JSON and friends) to hand it off between the host and the plugin.
-
Storing variable data structures
What kind of access do you need to the data ? You should be able to make a safe api to the Vec class by iterating on in in chunks, and using a closure to translate data between u8 and other representations. ( f32, u32 has the fomr_ne_bytes() / to_ne_bytes() methods ) You could make a helper function that takes a format description ( i.e. "fffuucc" , and calculates the size of the chunk, and generates a closure for reading accessing the data, of the layout is completely dynamic. This closure could use an enum to wrap the different primitive types. ) Or if the layouts are known at compile time , you could use procedural macros to generate code for serializaion / deserialization inot the the [u8] , though https://crates.io/crates/bincode may already do that for you )
-
Serde Bincode not De-serializing Bools?
Apparently there's a lot of discussion going on about that (3 of the 4 open tickets on the bincode implementation are about it), for example this one.
What are some alternatives?
rust-htmlescape - A HTML entity encoding library for Rust
serde - Serialization framework for Rust
msgpack-rust - MessagePack implementation for Rust / msgpack.org[Rust]
byteorder - Rust library for reading/writing numbers in big-endian and little-endian.
PyO3 - Rust bindings for the Python interpreter
retrokit - :joystick: Bring back the old Web(Kit) and make it secure
rust-cbor - CBOR (binary JSON) for Rust with automatic type based decoding and encoding.
tersenet - A new type of JavaScript-free light-weight fast browser built on rst and web assembly. Does not actually exist.
nue - I/O and binary data encoding for Rust
rust-bencode - Implementation of Bencode encoding written in rust
evcxr