restruct
hachoir
restruct | hachoir | |
---|---|---|
3 | 3 | |
345 | 586 | |
0.0% | - | |
3.2 | 6.4 | |
about 2 years ago | 3 months ago | |
Go | Python | |
ISC License | GNU General Public License v3.0 only |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
restruct
-
Why isn't there a Swagger/OpenAPI for binary formats?
My project Restruct[1] does Kaitai-like things but also supports serialization. Unfortunately, it only supports Go and only deals with Go struct tags rather than YAML manifests. Still, it totally can be used for serialization. I use it to sketch out quick projects against arbitrary binary formats. Two examples: one, parsing PNG headers to implement a quick binwalk-like program for just PNG that looks for the IEND chunk to extract accurately[2], two, a program that splits FL Studio FLP projects by playlist track[3].
I feel like I’ve self-promoted Restruct like four times on Hacker News, and I feel kind of bad because it could use improvements and even some bug fixes and I never seem to get around to it. Oh well. It’s still useful for me, I hope it’s useful for others, too.
That said, Kaitai has a fairly clear path towards adding serialization from a design PoV; many things that would be calculated for parsing structures in deserialization could just become checks/assertions in serialization. As an example, checking that an expression calculates out to the expected value would be a reasonable approach. Reversible expressions could be implemented for some cases, too, if you want it to do more of the heavy lifting. I think the biggest obstacle is actually implementing it, and frankly my Scala is too weak to help with such a relatively big undertaking.
I’ve also played with the rust nom library, which implements functional programming style parser combinators. It is quite cool how it can express fairly complex grammars and binary formats pretty much equally well, albeit optimizing it effectively requires serious magic that I do not think nom has. (I assume in Haskell, the same thing can be done with mind-boggling optimization power.)
[1]: https://github.com/go-restruct/restruct
[2]: https://github.com/jchv/pngextract
[3]: https://github.com/jchv/flsplit
-
Kaitai Struct: A new way to develop parsers for binary structures
I’m a big fan of Kaitai Struct, to the point where I’ve even contributed a small bit of improvements to its Go support, and I use it in a handful of small projects. It’s indispensable for spelunking blobs of binary data.
I’ve also taken some inspiration with a Go library I wrote, restruct:
https://github.com/go-restruct/restruct
… which is a bit like Go’s JSON encoding/decoding library, but with kaitai-like annotations for binary encoding. (Check the PNG example to see some of what can be done with it.)
-
Plain Text Protocols
Honestly, I dislike plaintext formats a lot. It is more accessible because it’s human readable. But, this only extends to humans who happen to speak the language the protocol uses for keywords. While it’s not a huge ask, I still suggest this is mostly not that interesting of a benefit.
Parsing and emitting plaintext formats, meanwhile, is a rabbit hole. It’s human readable which makes you tempted to make it human writable. Should you accept extraneous whitespace? Tabs vs spaces? Terminating new line? Unix or DOS line endings? Etc.
Binary data may seem less accessible, but I blame the libraries. There’s tons of easy ways to parse text. You can use string.split, atoi and scanf in your language of choice. What is there for binary?
In Go, the encoding/binary package actually implements something really cool. A simple reflection-based mechanism that can read and write binary data into a structure in a defined and simple way.
lunixbochs extended this to struc[1], which adds additional tags for advanced reading and writing of binary structures, including variable length structures. I went further and maybe a bit off into the deep end with Restruct[2], a similar concept but with a lot more features, designed specifically so I could handle advanced structures quickly.
The end result is that I can define some Go structs with integers, strings, byte arrays and corresponding tags, and be able to serialize and deserialize from those structures to their corresponding binary representation. For an overdone demo of what you could do with Restruct for example, see this (incomplete) PNG demo: https://github.com/go-restruct/restruct/blob/master/formats/... (It is mainly incomplete because I had moved focus to develop a codegen for restruct, to improve runtime performance, although such work has since stalled.)
[1]: https://pkg.go.dev/github.com/lunixbochs/struc
[2]: https://pkg.go.dev/github.com/go-restruct/restruct
hachoir
-
Magika: AI powered fast and efficient file type identification
https://github.com/vstinner/hachoir/blob/main/hachoir/subfil...
File signature:
-
Kaitai Struct: A new way to develop parsers for binary structures
I contributed a number of file formats a few years ago (and attempted numerous others) but ran into a number of problems with certain file formats:
1. It's not possible to read from the file until a multiple byte termination sequence is detected. [1]
2. You can't read sections of a file where the termination condition is the presence of a sequence of bytes denoting the next unrelated section of the file (and you don't want to consume/read these bytes) [2]
3. The WebIDE at the time couldn't handle very large file format specifications such as Photoshop (PSD) [3]
4. Files containing compressed or encrypted sections require a compression/encryption algorithm to be hardcoded into Kaitai struct libraries for each programming language it can output to.
The WebIDE I particularly liked as it makes it easy to get started and share results. I also liked how Kaitai Struct allows easy definition of constraints (simple ones at least) into the file format specification so that you can say "this section of the file shall have a size not exceeding header.length * 2 bytes".
Some alternative binary file format specification attempts for those interested in seeing alternatives, each with their own set of problems/pros/cons:
1. 010 Editor [4]
2. Synalysis [5]
3. hachoir [6]
4. DFDL [7]
[1] https://github.com/kaitai-io/kaitai_struct/issues/158
[2] https://github.com/kaitai-io/kaitai_struct/issues/156
[3] https://raw.githubusercontent.com/davidhicks/kaitai_struct_f...
[4] https://www.sweetscape.com/010editor/repository/templates/
[5] https://github.com/synalysis/Grammars
[6] https://github.com/vstinner/hachoir/tree/main/hachoir/parser
[7] https://github.com/DFDLSchemas/
-
PyWhat: Identify Anything
Another one sort of related is hachoir, and specifically the hachoir-metadata script: https://github.com/vstinner/hachoir
What are some alternatives?
Kaitai Struct - Kaitai Struct: declarative language to generate binary data parsers in C++ / C# / Go / Java / JavaScript / Lua / Nim / Perl / PHP / Python / Ruby
binrw - A Rust crate for helping parse and rebuild binary data using ✨macro magic✨.
usaddress - :us: a python library for parsing unstructured United States address strings into address components
cantordust - Public repository for Cantordust Ghidra plugin.
fuckitjs - The Original Javascript Error Steamroller
kaitai-to-wireshark - Converts a Kaitai Struct file description to a Wireshark LUA plugin
pyWhat - 🐸 Identify anything. pyWhat easily lets you identify emails, IP addresses, and more. Feed it a .pcap file or some text and it'll tell you what it is! 🧙♀️
HTTP Parser - http request/response parser for c
probablepeople - :family: a python library for parsing unstructured western names into name components.
RecordFlux - Formal specification and generation of verifiable binary parsers, message generators and protocol state machines
smm2-documentation - Documentation for the game Super Mario Maker 2.