map_benchmark
Hopscotch map
Our great sponsors
map_benchmark | Hopscotch map | |
---|---|---|
5 | 3 | |
287 | 698 | |
- | - | |
5.5 | 3.7 | |
about 1 year ago | 7 months ago | |
C++ | C++ | |
MIT License | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
map_benchmark
-
Optimizing Open Addressing
I tied adding the maps to the [1] benchmark, but I wasn't able to, since they aren't type generic yet. You may want to benchmark against [2], and [3] which are the best performing ones in the above benchmark.
[1] https://github.com/martinus/map_benchmark/
[2] https://github.com/TheNumbat/hashtables/blob/main/code/robin...
[3] https://github.com/ktprime/ktprime
-
Benchmarking my data structure
In any case, I guess you can find some inspiration in this comparison of maps which was posted to /r/cpp a couple of months ago: https://martin.ankerl.com/2022/08/27/hashmap-bench-01/ (source code for the benchmark seems to be on https://github.com/martinus/map_benchmark ). It's made for maps but adjusting most benchmarks for other containers should be fairly straightforward.
-
Is there a committee paper on a "simplified" random interface?
There is no such thing as best random generator. They have so many different properties, e.g. is it cryptographic secure, how large is the state, how fast on x86 architecture, can it jump forward, etc. My go-to generator is sfc64 because it's fast, simple, and seems to produce high quality random numbers. Here is one implementation: https://github.com/martinus/map_benchmark/blob/master/src/app/sfc64.h other popular generators are PCG and xorshift
-
boost::unordered map is a new king of data structures
I've implemented this PoolAllocator which does exactly this: https://github.com/martinus/map_benchmark/blob/master/src/app/pool.h
-
Development Plan for Boost.Unordered
Hi Martin, thanks for the pointer! BTW, I think we may use your impressive benchmark suite to test our advances once we come up with something worth deep testing.
Hopscotch map
-
boost::unordered map is a new king of data structures
Unordered hash map shootout CMAP = https://github.com/tylov/STC KMAP = https://github.com/attractivechaos/klib PMAP = https://github.com/greg7mdp/parallel-hashmap FMAP = https://github.com/skarupke/flat_hash_map RMAP = https://github.com/martinus/robin-hood-hashing HMAP = https://github.com/Tessil/hopscotch-map TMAP = https://github.com/Tessil/robin-map UMAP = std::unordered_map Usage: shootout [n-million=40 key-bits=25] Random keys are in range [0, 2^25). Seed = 1656617916: T1: Insert/update random keys: KMAP: time: 1.949, size: 15064129, buckets: 33554432, sum: 165525449561381 CMAP: time: 1.649, size: 15064129, buckets: 22145833, sum: 165525449561381 PMAP: time: 2.434, size: 15064129, buckets: 33554431, sum: 165525449561381 FMAP: time: 2.112, size: 15064129, buckets: 33554432, sum: 165525449561381 RMAP: time: 1.708, size: 15064129, buckets: 33554431, sum: 165525449561381 HMAP: time: 2.054, size: 15064129, buckets: 33554432, sum: 165525449561381 TMAP: time: 1.645, size: 15064129, buckets: 33554432, sum: 165525449561381 UMAP: time: 6.313, size: 15064129, buckets: 31160981, sum: 165525449561381 T2: Insert sequential keys, then remove them in same order: KMAP: time: 1.173, size: 0, buckets: 33554432, erased 20000000 CMAP: time: 1.651, size: 0, buckets: 33218751, erased 20000000 PMAP: time: 3.840, size: 0, buckets: 33554431, erased 20000000 FMAP: time: 1.722, size: 0, buckets: 33554432, erased 20000000 RMAP: time: 2.359, size: 0, buckets: 33554431, erased 20000000 HMAP: time: 0.849, size: 0, buckets: 33554432, erased 20000000 TMAP: time: 0.660, size: 0, buckets: 33554432, erased 20000000 UMAP: time: 2.138, size: 0, buckets: 31160981, erased 20000000 T3: Remove random keys: KMAP: time: 1.973, size: 0, buckets: 33554432, erased 23367671 CMAP: time: 2.020, size: 0, buckets: 33218751, erased 23367671 PMAP: time: 2.940, size: 0, buckets: 33554431, erased 23367671 FMAP: time: 1.147, size: 0, buckets: 33554432, erased 23367671 RMAP: time: 1.941, size: 0, buckets: 33554431, erased 23367671 HMAP: time: 1.135, size: 0, buckets: 33554432, erased 23367671 TMAP: time: 1.064, size: 0, buckets: 33554432, erased 23367671 UMAP: time: 5.632, size: 0, buckets: 31160981, erased 23367671 T4: Iterate random keys: KMAP: time: 0.748, size: 23367671, buckets: 33554432, repeats: 8, sum: 4465059465719680 CMAP: time: 0.627, size: 23367671, buckets: 33218751, repeats: 8, sum: 4465059465719680 PMAP: time: 0.680, size: 23367671, buckets: 33554431, repeats: 8, sum: 4465059465719680 FMAP: time: 0.735, size: 23367671, buckets: 33554432, repeats: 8, sum: 4465059465719680 RMAP: time: 0.464, size: 23367671, buckets: 33554431, repeats: 8, sum: 4465059465719680 HMAP: time: 0.719, size: 23367671, buckets: 33554432, repeats: 8, sum: 4465059465719680 TMAP: time: 0.662, size: 23367671, buckets: 33554432, repeats: 8, sum: 4465059465719680 UMAP: time: 6.168, size: 23367671, buckets: 31160981, repeats: 8, sum: 4465059465719680 T5: Lookup random keys: KMAP: time: 0.943, size: 23367671, buckets: 33554432, lookups: 34235332, found: 29040438 CMAP: time: 0.863, size: 23367671, buckets: 33218751, lookups: 34235332, found: 29040438 PMAP: time: 1.635, size: 23367671, buckets: 33554431, lookups: 34235332, found: 29040438 FMAP: time: 0.969, size: 23367671, buckets: 33554432, lookups: 34235332, found: 29040438 RMAP: time: 1.705, size: 23367671, buckets: 33554431, lookups: 34235332, found: 29040438 HMAP: time: 0.712, size: 23367671, buckets: 33554432, lookups: 34235332, found: 29040438 TMAP: time: 0.584, size: 23367671, buckets: 33554432, lookups: 34235332, found: 29040438 UMAP: time: 1.974, size: 23367671, buckets: 31160981, lookups: 34235332, found: 29040438
-
Yes, this is embarrassingly slow .so I solved your problem
the map member used for the lookups is a tsl::hopscotch_map (https://github.com/Tessil/hopscotch-map), which is a proper hash map. so it seems to be the latter, that the API is wrong, but from what I can tell it is only a wrongly named class. i don't see where the API makes guarantees about iteration order, which is where the implementation difference would be noticeable (beyond performance for lookup).
-
Any suggestions for resources to optimize for memory allocation/reallocation?
using an open-addressing hash table, such as abseil flat_hash_map or tessil/hopscotch-map
What are some alternatives?
robin-map - C++ implementation of a fast hash map and hash set using robin hood hashing
C++ B-tree - Git mirror of the official (mercurial) repository of cpp-btree
unordered_dense - A fast & densely stored hashmap and hashset based on robin-hood backward shift deletion
PEGTL - Parsing Expression Grammar Template Library
unordered - Boost.org unordered module
sparsehash-c11 - Experimental C++11 version of sparsehash
parallel-hashmap - A family of header-only, very fast and memory-friendly hashmap and btree containers.
sparsehash - C++ associative containers
robin-hood-hashing - Fast & memory efficient hashtable based on robin hood hashing for C++11/14/17/20
Optional Argument in C++ - Named Optional Arguments in C++17
STC - A modern, user friendly, generic, type-safe and fast C99 container library: String, Vector, Sorted and Unordered Map and Set, Deque, Forward List, Smart Pointers, Bitset and Random numbers.
Hashmaps - Various open addressing hashmap algorithms in C++