CC
Kaitai Struct
CC | Kaitai Struct | |
---|---|---|
21 | 44 | |
101 | 3,839 | |
- | 1.1% | |
4.3 | 7.5 | |
21 days ago | 20 days ago | |
C | Shell | |
MIT License | GPL-3.0-or-later |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
CC
-
preprocessor stuff - compile time pasting into other files
With extendible macros, you could achieve the following:
-
Factor is faster than Zig
In my example the table stores the hash codes themselves instead of the keys (because the hash function is invertible)
Oh, I see, right. If determining the home bucket is trivial, then the back-shifting method is great. The issue is just that it’s not as much of a general-purpose solution as it may initially seem.
“With a different algorithm (Robin Hood or bidirectional linear probing), the load factor can be kept well over 90% with good performance, as the benchmarks in the same repo demonstrate.”
I’ve seen the 90% claim made several times in literature on Robin Hood hash tables. In my experience, the claim is a bit exaggerated, although I suppose it depends on what our idea of “good performance” is. See these benchmarks, which again go up to a maximum load factor of 0.95 (Although boost and Absl forcibly grow/rehash at 0.85-0.9):
https://strong-starlight-4ea0ed.netlify.app/
Tsl, Martinus, and CC are all Robin Hood tables (https://github.com/Tessil/robin-map, https://github.com/martinus/robin-hood-hashing, and https://github.com/JacksonAllan/CC, respectively). Absl and Boost are the well-known SIMD-based hash tables. Khash (https://github.com/attractivechaos/klib/blob/master/khash.h) is, I think, an ordinary open-addressing table using quadratic probing. Fastmap is a new, yet-to-be-published design that is fundamentally similar to bytell (https://www.youtube.com/watch?v=M2fKMP47slQ) but also incorporates some aspects of the aforementioned SIMD maps (it caches a 4-bit fragment of the hash code to avoid most key comparisons).
As you can see, all the Robin Hood maps spike upwards dramatically as the load factor gets high, becoming as much as 5-6 times slower at 0.95 vs 0.5 in one of the benchmarks (uint64_t key, 256-bit struct value: Total time to erase 1000 existing elements with N elements in map). Only the SIMD maps (with Boost being the better performer) and Fastmap appear mostly immune to load factor in all benchmarks, although the SIMD maps do - I believe - use tombstones for deletion.
I’ve only read briefly about bi-directional linear probing – never experimented with it.
-
If this isn't the perfect data structure, why?
From your other comments, it seems like your knowledge of hash tables might be limited to closed-addressing/separate-chaining hash tables. The current frontrunners in high-performance, memory-efficient hash table design all use some form of open addressing, largely to avoid pointer chasing and limit cache misses. In this regard, you want to check our SSE-powered hash tables (such as Abseil, Boost, and Folly/F14), Robin Hood hash tables (such as Martinus and Tessil), or Skarupke (I've recently had a lot of success with a similar design that I will publish here soon and is destined to replace my own Robin Hood hash tables). Also check out existing research/benchmarks here and here. But we a little bit wary of any benchmarks you look at or perform because there are a lot of factors that influence the result (e.g. benchmarking hash tables at a maximum load factor of 0.5 will produce wildly different result to benchmarking them at a load factor of 0.95, just as benchmarking them with integer keys-value pairs will produce different results to benchmarking them with 256-byte key-value pairs). And you need to familiarize yourself with open addressing and different probing strategies (e.g. linear, quadratic) first.
- Convenient Containers: A usability-oriented generic container library
-
[Noob Question] How do C programmers get around not having hash maps?
CC (Full disclosure: I authored this one)
-
New C features in GCC 13
If you're using C23 or have typeof (so GCC or Clang), then yet another approach is to define a type that aliases the specified type if it is unique or otherwise becomes a "dummy" type. Here's what that looks like in CC:
-
Convenient Containers v1.0.3: Better compile speed, faster maps and sets
I’d like to share version 1.0.3 of Convenient Containers (CC), my generic container library. The library was previously discussed here and here. As explained elsewhere,
-
Popular Data Structure Libraries in C ?
Convenient Containers (CC) - I'm the author of this one.
-
So what's the best data structures and algorithms library for C?
"Using macros" is a broad description that covers multiple paradigms. There are libraries that use macros in combination with typed pointers and functions that take void* parameters to provide some degree of API genericity and type safety at the same time (e.g. stb_ds and, as you mentioned, my own CC). There are libraries that use macros (or #include directives) to manually instantiate templates (e.g. STC, M*LIB, and Pottery). And then there are libraries that are implemented entirely or almost entirely as macros (e.g. uthash).
-
How do you deal with the extra verbosity of C?
Shameless plug: Take a look a my library Convenient Containers, which solves this exact problem within the (narrow) domain of data structures.
Kaitai Struct
- Reverse-engineering an encrypted IoT protocol
-
Parsing an Undocumented File Format
- ImHex [2], which has a pattern language [3] which allows parsing, and it seems more powerful than what Kaitai offers. I stumbled upon some limitations with it but it was still useful.
[1]: https://kaitai.io/
- Kaitai Struct – a declarative language used to describe binary data structures
-
HTTPie Desktop: cross-platform API testing client for humans
Beautiful. Didn't know something like this exists. Reminds me of Katai[0]
[0]. https://kaitai.io/
-
Hacking the LG Monitor's EDID
An EDID override like this would be helpful for macOS as well, where the monitors swapping around after standby is a real annoyance [0] [1]
EDID rewrites are 99% of the time blocked by the monitor firmware: https://notes.alinpanaitiu.com/Decoding-monitor-EDID-on-macO...
By the way, one helpful tool that helped me navigate the EDID dump was Kaitai Struct [2]. It shows a side by side view with the hex view and the EDID structure, and it highlights the hex values in real time as you navigate the structure. Unfortunately [3] it doesn't support the extension blocks that the author needs.
[0] https://notes.alinpanaitiu.com/Weird-monitor-bugs
[1] https://forums.macrumors.com/threads/external-displays-swapp...
[2] https://kaitai.io/
[3] https://github.com/kaitai-io/edid.ksy
- Kaitai Struct: new way to develop parsers for binary structures
-
Fq: Jq for Binary Formats
Kaitai Struct might be a good choice for that: https://kaitai.io/
-
Ingesting, parsing and making sense of device log data
For binary log format, there's the excellent Kaitai Struct frameworks, that make it very easy to generate parsers from a declarative schema
-
What is this tool? More info in comments
kaitai
-
Visual Programming with Elixir: Learning to Write Binary Parsers (2019)
https://kaitai.io/
Worth a look if you are writing binary parsers.
What are some alternatives?
rust-bindgen - Automatically generates Rust FFI bindings to C (and some C++) libraries.
Protobuf - Protocol Buffers - Google's data interchange format
mlib - Library of generic and type safe containers in pure C language (C99 or C11) for a wide collection of container (comparable to the C++ STL).
csvkit - A suite of utilities for converting to and working with CSV, the king of tabular file formats.
stent - Completely avoid dangling pointers in C.
Camelot - A Python library to extract tabular data from PDFs
SDS - Simple Dynamic Strings library for C
tablib - Python Module for Tabular Datasets in XLS, CSV, JSON, YAML, &c.
Generic-Data-Structures - A set of Data Structures for the C programming language
PDFMiner - Python PDF Parser (Not actively maintained). Check out pdfminer.six.
stb - stb single-file public domain libraries for C/C++
PyYAML