restruct
nom
restruct | nom | |
---|---|---|
3 | 85 | |
345 | 9,020 | |
0.0% | 0.9% | |
3.2 | 7.4 | |
about 2 years ago | 7 days ago | |
Go | Rust | |
ISC License | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
restruct
-
Why isn't there a Swagger/OpenAPI for binary formats?
My project Restruct[1] does Kaitai-like things but also supports serialization. Unfortunately, it only supports Go and only deals with Go struct tags rather than YAML manifests. Still, it totally can be used for serialization. I use it to sketch out quick projects against arbitrary binary formats. Two examples: one, parsing PNG headers to implement a quick binwalk-like program for just PNG that looks for the IEND chunk to extract accurately[2], two, a program that splits FL Studio FLP projects by playlist track[3].
I feel like I’ve self-promoted Restruct like four times on Hacker News, and I feel kind of bad because it could use improvements and even some bug fixes and I never seem to get around to it. Oh well. It’s still useful for me, I hope it’s useful for others, too.
That said, Kaitai has a fairly clear path towards adding serialization from a design PoV; many things that would be calculated for parsing structures in deserialization could just become checks/assertions in serialization. As an example, checking that an expression calculates out to the expected value would be a reasonable approach. Reversible expressions could be implemented for some cases, too, if you want it to do more of the heavy lifting. I think the biggest obstacle is actually implementing it, and frankly my Scala is too weak to help with such a relatively big undertaking.
I’ve also played with the rust nom library, which implements functional programming style parser combinators. It is quite cool how it can express fairly complex grammars and binary formats pretty much equally well, albeit optimizing it effectively requires serious magic that I do not think nom has. (I assume in Haskell, the same thing can be done with mind-boggling optimization power.)
[1]: https://github.com/go-restruct/restruct
[2]: https://github.com/jchv/pngextract
[3]: https://github.com/jchv/flsplit
-
Kaitai Struct: A new way to develop parsers for binary structures
I’m a big fan of Kaitai Struct, to the point where I’ve even contributed a small bit of improvements to its Go support, and I use it in a handful of small projects. It’s indispensable for spelunking blobs of binary data.
I’ve also taken some inspiration with a Go library I wrote, restruct:
https://github.com/go-restruct/restruct
… which is a bit like Go’s JSON encoding/decoding library, but with kaitai-like annotations for binary encoding. (Check the PNG example to see some of what can be done with it.)
-
Plain Text Protocols
Honestly, I dislike plaintext formats a lot. It is more accessible because it’s human readable. But, this only extends to humans who happen to speak the language the protocol uses for keywords. While it’s not a huge ask, I still suggest this is mostly not that interesting of a benefit.
Parsing and emitting plaintext formats, meanwhile, is a rabbit hole. It’s human readable which makes you tempted to make it human writable. Should you accept extraneous whitespace? Tabs vs spaces? Terminating new line? Unix or DOS line endings? Etc.
Binary data may seem less accessible, but I blame the libraries. There’s tons of easy ways to parse text. You can use string.split, atoi and scanf in your language of choice. What is there for binary?
In Go, the encoding/binary package actually implements something really cool. A simple reflection-based mechanism that can read and write binary data into a structure in a defined and simple way.
lunixbochs extended this to struc[1], which adds additional tags for advanced reading and writing of binary structures, including variable length structures. I went further and maybe a bit off into the deep end with Restruct[2], a similar concept but with a lot more features, designed specifically so I could handle advanced structures quickly.
The end result is that I can define some Go structs with integers, strings, byte arrays and corresponding tags, and be able to serialize and deserialize from those structures to their corresponding binary representation. For an overdone demo of what you could do with Restruct for example, see this (incomplete) PNG demo: https://github.com/go-restruct/restruct/blob/master/formats/... (It is mainly incomplete because I had moved focus to develop a codegen for restruct, to improve runtime performance, although such work has since stalled.)
[1]: https://pkg.go.dev/github.com/lunixbochs/struc
[2]: https://pkg.go.dev/github.com/go-restruct/restruct
nom
-
Planespotting with Rust: using nom to parse ADS-B messages
Just in case you are not familiar with nom, it is a parser combinator written in Rust. The most basic thing you can do with it is import one of its parsing functions, give it some byte or string input and then get a Result as output with the parsed value and the rest of the input or an error if the parser failed. tag for example is used to recognize literal character/byte sequences.
-
Show HN: Rust nom parsing Starcraft2 Replays into Arrow for Polars data analysis
I may be the only one not familiar, but nom refers to https://github.com/rust-bakery/nom which looks like a pretty handy way to parse binary data in Rust.
-
Is this a good way to free up some memory?
Lots of people use nom for their parsing needs, but that's not the only game in town and there other options.
-
What is the state of the art for creating domain-specific languages (DSLs) with Rust?
As much as I love nom as well as other parser combinator libraries, regex-based parsers, BNF/EBNF-based parsers, etc. I always end up going back to plain old text-based char-by-char scanners.
-
What's everyone working on this week (22/2023)?
I am using nom / nom_locate to build the parser side because I've done a handful of other projects with it, and I plan to use tower-lsp to hook up the language server side.
-
Tokenizing
Look into a parsing library such as https://github.com/rust-bakery/nom
-
Something like pydantic but for just strings?
If we were in /r/learnrust I'd have recommended the nom crate for this.
- Nom: Parser Combinators Library in Rust
-
lua bytecode parser written in rust
Thanks to the flexibility of [nom](https://github.com/rust-bakery/nom), it is very easy to write your own parser in rust, read [this article](https://github.com/metaworm/luac-parser-rs/wiki/Write-custom-luac-parser) to learn how to write a luac parser
-
Should I revisit my choice to use nom?
I've been working on an assembler and right now it uses nom. While nom isn't great for error messages, good error messages will be important for this particular assembler (current code), so I've been attempting to use the methods described by Eyal Kalderon in Error recovery with parser combinators (using nom).
What are some alternatives?
Kaitai Struct - Kaitai Struct: declarative language to generate binary data parsers in C++ / C# / Go / Java / JavaScript / Lua / Nim / Perl / PHP / Python / Ruby
pest - The Elegant Parser
binrw - A Rust crate for helping parse and rebuild binary data using ✨macro magic✨.
lalrpop - LR(1) parser generator for Rust
cantordust - Public repository for Cantordust Ghidra plugin.
combine - A parser combinator library for Rust
kaitai-to-wireshark - Converts a Kaitai Struct file description to a Wireshark LUA plugin
pom - PEG parser combinators using operator overloading without macros.
HTTP Parser - http request/response parser for c
rust-peg - Parsing Expression Grammar (PEG) parser generator for Rust
RecordFlux - Formal specification and generation of verifiable binary parsers, message generators and protocol state machines
chumsky - Write expressive, high-performance parsers with ease.