wuffs
nom
Our great sponsors
wuffs | nom | |
---|---|---|
80 | 85 | |
3,695 | 8,943 | |
1.1% | 1.6% | |
9.4 | 6.5 | |
2 days ago | 25 days ago | |
C | Rust | |
GNU General Public License v3.0 or later | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
wuffs
-
Still no love for JPEG XL: Browser maker love-in snubs next-gen image format
Maybe this is what you are looking for:
https://github.com/google/wuffs
"Wuffs is a memory-safe programming language (and a standard library written in that language) for Wrangling Untrusted File Formats Safely."
-
Just about every Windows/Linux device vulnerable to new LogoFAIL firmware attack
This is one of the reasons I'm a big fan of wuffs[0] - it specifically targets dealing with formats like pictures, safely, and the result drops in to a C codebase to make the compat/migration story easy.
-
Google assigns a CVE for libwebp and gives it a 10.0 score
One example for a safer language developed at Google: https://github.com/google/wuffs
There are already huffman-decoding and some parts of webp algorithms in https://github.com/google/wuffs (language that finds missing bounds checks during compilations). In contrary, according to readme, this language allows to write more optimized code (compared to C). WEBP decoding is stated as a midterm target in the roadmap.
-
The WebP 0day
Specifically, since performance is crucial for this type of work, it should be written in WUFFS. WUFFS doesn't emit bounds checks (as Java does and as Rust would where it's unclear why something should be in bounds at runtime) it just rejects programs where it can't see why the indexes are in-bounds.
https://github.com/google/wuffs
You can explicitly write the same checks and meet this requirement, but chances are since you believe you're producing a high performance piece of software which doesn't need checks you'll instead be pulled up by the fact the WUFFS tooling won't accept your code and discover you got it wrong.
This is weaker than full blown formal verification, but not for the purpose we care about in program safety, thus a big improvement on humans writing LGTM.
-
What If OpenDocument Used SQLite?
> parsing encoded files tends to introduce vulnerabilities
If we are talking about binary formats, now there are systematic solutions like https://github.com/google/wuffs that protect against vulnerabilities. But SQLite is not just a format - it's an evolving ecosystem with constantly added features. And the most prominent issue was not even in core, it was in FTS3. What will SQLite add next? More json-related functions? Maybe BSON? It is useful, but does not help in this situation.
Regarding traces, there are many forensics tools and even books about forensic analysis of SQLite databases. In well-designed format such tools should not exist in the first place. This is hard requirement: if it requires rewriting the whole file - then so be it.
-
CVE-2023-4863: Heap buffer overflow in WebP (Chrome)
I agree that Wuffs [1] would have been a very good alternative! If it can be made more generally. AFAIK Wuffs is still very limited, in particular it never allows dynamic allocation. Many formats, including those supported by Wuffs the library, need dynamic allocation, so Wuffs code has to be glued with unverified non-Wuffs code [2]. This only works with simpler formats.
[1] https://github.com/google/wuffs/blob/main/doc/wuffs-the-lang...
[2] https://github.com/google/wuffs/blob/main/doc/note/memory-sa...
-
NSO Group iPhone Zero-Click, Zero-Day Exploit Captured in the Wild
There are efforts to do that, notably https://github.com/google/wuffs
RLBox is another interesting option that lets you sandbox C/C++ code.
I think the main reason is that security is one of those things that people don't care about until it is too late to change. They get to the point of having a fast PDF library in C++ that has all the features. Then they realise that they should have written it in a safer language but by that point it means a complete rewrite.
The same reason not enough people use Bazel. By the time most people realise they need it, you've already implemented a huge build system using Make or whatever.
-
FaaS in Go with WASM, WASI and Rust
Here's an off-topic answer.
Depends on what you want your toy language to do and what sort of runtime support you'd like to lean on.
JVM is pretty good for a lot of script-y languages, does impose overhead of having a JVM around. Provides GC, Threads, Reflection, consistent semantics. Tons of tools, libraries, support.
WebAssembly is constrained (for running-in-a-browser safety reasons) but then you get to run your code in a browser, or as a service, etc, and Other People are working hard on the problem of getting your WA to go fast. That used to be a big reason for using JVM, but it turns out that Security Is Darn Hard.
I have used C in the (distant) past as an IL, and that works up to a point, implementing garbage collection can be a pain if that's a thing that you want. C compilers have had a lot of work on them over the years, and you also have access to some low-level stuff, so if you were E.G. trying to come up with a little language that had super-good performance, C might be a good choice. (See also, [Wuffs](https://github.com/google/wuffs), by Nigel Tao et al at Google).
A suggestion, if you do target C -- don't work too hard to find isomorphisms between C's data structures and YourToyLang's data structures. Back around 1990, I did my C-generating compiler for Modula-3, and a friend at Xerox PARC used C as a target for Cedar Mesa, and Hans used it in a lower-level way (so I was mapping between M-3 records and C structs, for example, Hans was not) and the lower-level way worked better -- i.e., I chose poorly. It worked, but lower-level worked better.
If you are targeting a higher-level language, Rust and Go both seem like interesting options to me. Both have the disadvantage that they are still changing slightly but you get interesting "services" from the underlying VM -- for Rust, the borrow checker, plus libraries, for Go, reflection, goroutines, and the GC, plus libraries.
Rust should get you slightly higher performance, but I'd worry that you couldn't hide the existence of the borrow checker from your toy language, especially if you wanted to interact with Rust libraries from YTL. If you wanted to learn something vaguely publishable/wider-interesting, that question right there ("can I compile a TL to Rust, touch the Rust libraries, and not expose the borrow checker? No+what-I-tried/Yes+this-worked") is not bad.
I have a minor conflict of interest suggesting Go; I work on Go, usually on the compiler, and machine-generated code makes great test data. But regarded as a VM, I am a little puzzled why it hasn't seen wider use, because the GC is great (for lower-allocation rates than Java however; JVM GC has higher throughout efficiency, but Go has tagless objects, interior pointer support, and tiny pause times. Go-the-language makes it pretty easy to allocate less.) Things Go-as-a-VM currently lacks:
- tail call elimination (JVM same)
-
Don't carelessly rely on fixed-size unsigned integers overflow
Because if you couldn't prevent creation of pointers from the thin air (e.g. by sending them to remove server and then pulling them from said server) then you can not prove anything of that sort and if you limit such operations then you are starting journey on the road to Rust or Wuffs!
nom
-
Planespotting with Rust: using nom to parse ADS-B messages
Just in case you are not familiar with nom, it is a parser combinator written in Rust. The most basic thing you can do with it is import one of its parsing functions, give it some byte or string input and then get a Result as output with the parsed value and the rest of the input or an error if the parser failed. tag for example is used to recognize literal character/byte sequences.
-
Show HN: Rust nom parsing Starcraft2 Replays into Arrow for Polars data analysis
I may be the only one not familiar, but nom refers to https://github.com/rust-bakery/nom which looks like a pretty handy way to parse binary data in Rust.
-
Is this a good way to free up some memory?
Lots of people use nom for their parsing needs, but that's not the only game in town and there other options.
-
What is the state of the art for creating domain-specific languages (DSLs) with Rust?
As much as I love nom as well as other parser combinator libraries, regex-based parsers, BNF/EBNF-based parsers, etc. I always end up going back to plain old text-based char-by-char scanners.
-
What's everyone working on this week (22/2023)?
I am using nom / nom_locate to build the parser side because I've done a handful of other projects with it, and I plan to use tower-lsp to hook up the language server side.
-
lua bytecode parser written in rust
Thanks to the flexibility of [nom](https://github.com/rust-bakery/nom), it is very easy to write your own parser in rust, read [this article](https://github.com/metaworm/luac-parser-rs/wiki/Write-custom-luac-parser) to learn how to write a luac parser
-
Should I revisit my choice to use nom?
I've been working on an assembler and right now it uses nom. While nom isn't great for error messages, good error messages will be important for this particular assembler (current code), so I've been attempting to use the methods described by Eyal Kalderon in Error recovery with parser combinators (using nom).
-
winnow = toml_edit + combine + nom
On my side, nom is still advancing well and a new major version is in preparation, with some interesting work a new GAT based design inspired from the awesome work on chumsky, that promises to bring great performance with complex error types. 2023 will be fun for parser libraries!
-
Question about lexer and parser generators in Rust
Checkout https://github.com/zesterer/chumsky or https://github.com/rust-bakery/nom
-
Writing a parser in Rust
I recently did a parsing project - I used the nom crate which is a functional/combinatorial style parser. Here's a really good video about the technique: https://www.youtube.com/watch?v=dDtZLm7HIJs
What are some alternatives?
pest - The Elegant Parser
lalrpop - LR(1) parser generator for Rust
combine - A parser combinator library for Rust
pom - PEG parser combinators using operator overloading without macros.
rust-peg - Parsing Expression Grammar (PEG) parser generator for Rust
chumsky - Write expressive, high-performance parsers with ease.
chomp - A fast monadic-style parser combinator designed to work on stable Rust.
serde - Serialization framework for Rust
rust-csv - A CSV parser for Rust, with Serde support.
png-decoder - A pure-Rust, no_std compatible PNG decoder
Pest - Pest is an elegant PHP testing Framework with a focus on simplicity, meticulously designed to bring back the joy of testing in PHP.
stb - stb single-file public domain libraries for C/C++