How to implement a hash table (in C)

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
  • sc

    Common libraries and data structures for C.

  • You skipped hs_remove but depending on implementation, hs_remove strategy may change your hs_set, e.g using tombstones. I’ve recently learnt that you don’t need tombstones at all in linear hashmaps, you just do backshift removals, it may be a simple addition to your map. Check out these repos for the implementation :

    (C) https://github.com/tezc/sc/tree/master/map

    (C++) https://github.com/rigtorp/HashMap

  • CPython

    The Python programming language

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • ht

    A simple hash table with non-blocking resize (by nicolasff)

  • > When the hash table gets too full, we need to allocate a larger array and move the items over. This is absolutely required when the number of items in the hash table has reached the size of the array, but usually you want to do it when the table is half or three-quarters full.

    Yes, but how you resize is important too: if you have a threshold size (like 3/4 full) at which you block and re-distribute all elements into a new array, you will incur a significant pause when this happens, e.g. https://log.kv.io/post/2009/05/15/leaky-hashtables

    Instead, when you reach the threshold amount you can create the new array and then gradually migrate the keys in small batches either with each operation or with a background thread. So on `get` if we're in the process of resizing: first check the new table, then the old one, and before returning migrate N keys from the old table to the new one. Free the old array once all the keys are migrated.

    I wrote a small hash table implementation with gradual re-hashing a while back, search for DICT_MAX_LOAD and dict_rehash here: https://github.com/nicolasff/ht/blob/master/dict.c#L193

  • Redis

    Redis is an in-memory database that persists on disk. The data model is key-value, but many different kind of values are supported: Strings, Lists, Sets, Sorted Sets, Hashes, Streams, HyperLogLogs, Bitmaps.

  • The progressive rehashing described in the article is very similar to what Redis does [1].

    Just thought I'd share a example of a use case where the incremental rehashing logic makes sense.

    [1]: https://github.com/redis/redis/blob/unstable/src/dict.c

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts