Our great sponsors
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
Redis
Redis is an in-memory database that persists on disk. The data model is key-value, but many different kind of values are supported: Strings, Lists, Sets, Sorted Sets, Hashes, Streams, HyperLogLogs, Bitmaps.
You skipped hs_remove but depending on implementation, hs_remove strategy may change your hs_set, e.g using tombstones. I’ve recently learnt that you don’t need tombstones at all in linear hashmaps, you just do backshift removals, it may be a simple addition to your map. Check out these repos for the implementation :
> When the hash table gets too full, we need to allocate a larger array and move the items over. This is absolutely required when the number of items in the hash table has reached the size of the array, but usually you want to do it when the table is half or three-quarters full.
Yes, but how you resize is important too: if you have a threshold size (like 3/4 full) at which you block and re-distribute all elements into a new array, you will incur a significant pause when this happens, e.g. https://log.kv.io/post/2009/05/15/leaky-hashtables
Instead, when you reach the threshold amount you can create the new array and then gradually migrate the keys in small batches either with each operation or with a background thread. So on `get` if we're in the process of resizing: first check the new table, then the old one, and before returning migrate N keys from the old table to the new one. Free the old array once all the keys are migrated.
I wrote a small hash table implementation with gradual re-hashing a while back, search for DICT_MAX_LOAD and dict_rehash here: https://github.com/nicolasff/ht/blob/master/dict.c#L193
The progressive rehashing described in the article is very similar to what Redis does [1].
Just thought I'd share a example of a use case where the incremental rehashing logic makes sense.
[1]: https://github.com/redis/redis/blob/unstable/src/dict.c