robin-hood-hashing
GLM
robin-hood-hashing | GLM | |
---|---|---|
23 | 36 | |
1,465 | 8,689 | |
- | 1.3% | |
0.0 | 8.9 | |
about 1 year ago | 13 days ago | |
C++ | C++ | |
MIT License | GNU General Public License v3.0 or later |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
robin-hood-hashing
-
Factor is faster than Zig
In my example the table stores the hash codes themselves instead of the keys (because the hash function is invertible)
Oh, I see, right. If determining the home bucket is trivial, then the back-shifting method is great. The issue is just that it’s not as much of a general-purpose solution as it may initially seem.
“With a different algorithm (Robin Hood or bidirectional linear probing), the load factor can be kept well over 90% with good performance, as the benchmarks in the same repo demonstrate.”
I’ve seen the 90% claim made several times in literature on Robin Hood hash tables. In my experience, the claim is a bit exaggerated, although I suppose it depends on what our idea of “good performance” is. See these benchmarks, which again go up to a maximum load factor of 0.95 (Although boost and Absl forcibly grow/rehash at 0.85-0.9):
https://strong-starlight-4ea0ed.netlify.app/
Tsl, Martinus, and CC are all Robin Hood tables (https://github.com/Tessil/robin-map, https://github.com/martinus/robin-hood-hashing, and https://github.com/JacksonAllan/CC, respectively). Absl and Boost are the well-known SIMD-based hash tables. Khash (https://github.com/attractivechaos/klib/blob/master/khash.h) is, I think, an ordinary open-addressing table using quadratic probing. Fastmap is a new, yet-to-be-published design that is fundamentally similar to bytell (https://www.youtube.com/watch?v=M2fKMP47slQ) but also incorporates some aspects of the aforementioned SIMD maps (it caches a 4-bit fragment of the hash code to avoid most key comparisons).
As you can see, all the Robin Hood maps spike upwards dramatically as the load factor gets high, becoming as much as 5-6 times slower at 0.95 vs 0.5 in one of the benchmarks (uint64_t key, 256-bit struct value: Total time to erase 1000 existing elements with N elements in map). Only the SIMD maps (with Boost being the better performer) and Fastmap appear mostly immune to load factor in all benchmarks, although the SIMD maps do - I believe - use tombstones for deletion.
I’ve only read briefly about bi-directional linear probing – never experimented with it.
-
If this isn't the perfect data structure, why?
From your other comments, it seems like your knowledge of hash tables might be limited to closed-addressing/separate-chaining hash tables. The current frontrunners in high-performance, memory-efficient hash table design all use some form of open addressing, largely to avoid pointer chasing and limit cache misses. In this regard, you want to check our SSE-powered hash tables (such as Abseil, Boost, and Folly/F14), Robin Hood hash tables (such as Martinus and Tessil), or Skarupke (I've recently had a lot of success with a similar design that I will publish here soon and is destined to replace my own Robin Hood hash tables). Also check out existing research/benchmarks here and here. But we a little bit wary of any benchmarks you look at or perform because there are a lot of factors that influence the result (e.g. benchmarking hash tables at a maximum load factor of 0.5 will produce wildly different result to benchmarking them at a load factor of 0.95, just as benchmarking them with integer keys-value pairs will produce different results to benchmarking them with 256-byte key-value pairs). And you need to familiarize yourself with open addressing and different probing strategies (e.g. linear, quadratic) first.
-
boost::unordered standalone
Also, FYI there is robin_hood::unordered_{map,set} which has very high performance, and is header-only and standalone.
-
Solving “Two Sum” in C with a tiny hash table
std::unordered_map is notoriously slow, several times slower than a "proper" hashmap implementation like Google's absl or Martin's robin-hood-hashing [1]. That said, std::sort is not the fastest sort implementation, either. It is hard to say which will win.
[1]: https://github.com/martinus/robin-hood-hashing
-
Convenient Containers v1.0.3: Better compile speed, faster maps and sets
The main advantage of the latest version is that it reduces build time by about 53% (GCC 12.1), based on the comprehensive test suit found in unit_tests.c. This improvement is significant because compile time was previously a drawback of this library, with maps and sets—in particular—compiling slower than their C++ template-based counterparts. I achieved it by refactoring the library to do less work inside API macros and, in particular, use fewer _Generic statements, which seem to be a compile-speed bottleneck. A nice side effect of the refactor is that the library can now more easily be extended with the planned dynamic strings and ordered maps and sets. The other major improvement concerns the performance of maps and sets. Here are some interactive benchmarks[1] comparing CC’s maps to two popular implementations of Robin Hood hash maps in C++ (as well as std::unordered_map as a baseline). They show that CC maps perform roughly on par with those implementations.
-
Effortless Performance Improvements in C++: std:unordered_map
For anyone in a situation where a set/map (or unordered versions) is in a hot part of the code, I'd also highly recommend Robin Hood: https://github.com/martinus/robin-hood-hashing
It made a huge difference in one of the programs I was running.
- Inside boost::unordered_flat_map
-
What are some cool modern libraries you enjoy using?
Oh my bad. Still thought -- your name.. it looks very familiar to me. Are you the robin_hood hashing guy perhaps? Yes you are! My bad -- https://github.com/martinus/robin-hood-hashing.
-
Performance comparison: counting words in Python, C/C++, Awk, Rust, and more
Got a bit better C++ version here which uses a couple libraries instead of std:: stuff - https://gist.github.com/jcelerier/74dfd473bccec8f1bd5d78be5a... ; boost, fmt and https://github.com/martinus/robin-hood-hashing
$ g++ -I robin-hood-hashing/src/include -O2 -flto -std=c++20 -fno-exceptions -fno-unwind-tables -fno-asynchronous-unwind-tables -lfmt
-
A fast & densely stored hashmap and hashset based on robin-hood backward shift deletion
The implementation is mostly inspired by this comment and lessons learned from my older robin-hood-hashing hashmap.
GLM
- Release of GLM 1.0.0
- C++23: The Next C++ Standard
-
What files from glm's github do I need to add to my emscripten project?
I am a greenhorn at graphics programming. I just made an app in OpenGL with C++ that I now need to change over to a browser app with WebGL. WebGL looks pretty cool but since my app does a lot of calculations I assumed I should keep the heavier calculating parts in C++ with emscripten ( which I am also just learning ). So looking at it, it just looks like glm is the only library I seriously need for my c++ code and that seems pretty cool because it is a header only app it says. But in the github there are a lot of folders and files so I am not sure which are indispensable or not. Any advice?
-
What is a file with the .i.hh extension such as myfile.i.hh used for in a C++ project?
GLM does it quite well, it has core includes then a detail folder with all the inl files that get added. https://github.com/g-truc/glm
- [Opengl] Aide: compilation et installation de GLFW
-
Porting to metal?
I once ported an OpenGL code base over to Metal. For me, it was essential to do as much code sharing as possible. Because I was using the GLM library in that code base and generally found that library very useful I wanted to know whether I can use GLM with Metal. I had to do some research but it turned out it works really well, see here
- Which is the best way to work with matrices and linear algebra using c++?
-
Best C++ Game Framework
I would also recommend GLM
- PocketPy: A Lightweight(~5000 LOC) Python Implementation in C++17
-
Learning DirectX 12 in 2023
Alongside MiniEngine, you’ll want to look into the DirectX Toolkit. This is a set of utilities by Microsoft that simplify graphics and game development. It contains libraries like DirectXMesh for parsing and optimizing meshes for DX12, or DirectXMath which handles 3D math operations like the OpenGL library glm. It also has utilities for gamepad input or sprite fonts. You can see a list of the headers here to get an idea of the features. You’ll definitely want to include this in your project if you don’t want to think about a lot of these solved problems (and don’t have to worry about cross-platform support).
What are some alternatives?
parallel-hashmap - A family of header-only, very fast and memory-friendly hashmap and btree containers.
Eigen
STL - MSVC's implementation of the C++ Standard Library.
DirectXMath - DirectXMath is an all inline SIMD C++ linear algebra library for use in games and graphics apps
robin-map - C++ implementation of a fast hash map and hash set using robin hood hashing
linmath.h - a lean linear math library, aimed at graphics programming. Supports vec3, vec4, mat4x4 and quaternions
xxHash - Extremely fast non-cryptographic hash algorithm
cglm - 📽 Highly Optimized 2D / 3D Graphics Math (glm) for C
C++ Format - A modern formatting library
OpenBLAS - OpenBLAS is an optimized BLAS library based on GotoBLAS2 1.13 BSD version.
tracy - Frame profiler
blaze