parallel-hashmap
1brc
parallel-hashmap | 1brc | |
---|---|---|
31 | 28 | |
2,316 | 5,136 | |
- | - | |
7.8 | 9.8 | |
27 days ago | 9 days ago | |
C++ | Java | |
Apache License 2.0 | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
parallel-hashmap
-
The One Billion Row Challenge in CUDA: from 17 minutes to 17 seconds
Standard library maps/unordered_maps are themselves notoriously slow anyway. A sparse_hash_map from abseil or parallel-hashmaps[1] would be better.
[1] https://github.com/greg7mdp/parallel-hashmap
-
My own Concurrent Hash Map picks
Cool! Looking forward to you trying my phmap - and please let me know if you have any question.
-
Boost 1.81 will have boost::unordered_flat_map...
I do this as well in my phmap and gtl implementations. It makes the tables look worse in benchmarks like the above, but prevents really bad surprises occasionally.
-
Comprehensive C++ Hashmap Benchmarks 2022
Thanks a lot for the great benchmark, Martin. Glad you used different hash functions, because I do sacrifice some speed to make sure that the performance of my hash maps doesn't degrade drastically with poor hash functions. Happy to see that my phmap and gtl (the C++20 version) performed well.
-
Can C++ maps be as efficient as Python dictionaries ?
I use https://github.com/greg7mdp/parallel-hashmap when I need better performance of maps and sets.
-
How to build a Chess Engine, an interactive guide
Then they should really try https://github.com/greg7mdp/parallel-hashmap, the current state of the art.
-
boost::unordered map is a new king of data structures
Unordered hash map shootout CMAP = https://github.com/tylov/STC KMAP = https://github.com/attractivechaos/klib PMAP = https://github.com/greg7mdp/parallel-hashmap FMAP = https://github.com/skarupke/flat_hash_map RMAP = https://github.com/martinus/robin-hood-hashing HMAP = https://github.com/Tessil/hopscotch-map TMAP = https://github.com/Tessil/robin-map UMAP = std::unordered_map Usage: shootout [n-million=40 key-bits=25] Random keys are in range [0, 2^25). Seed = 1656617916: T1: Insert/update random keys: KMAP: time: 1.949, size: 15064129, buckets: 33554432, sum: 165525449561381 CMAP: time: 1.649, size: 15064129, buckets: 22145833, sum: 165525449561381 PMAP: time: 2.434, size: 15064129, buckets: 33554431, sum: 165525449561381 FMAP: time: 2.112, size: 15064129, buckets: 33554432, sum: 165525449561381 RMAP: time: 1.708, size: 15064129, buckets: 33554431, sum: 165525449561381 HMAP: time: 2.054, size: 15064129, buckets: 33554432, sum: 165525449561381 TMAP: time: 1.645, size: 15064129, buckets: 33554432, sum: 165525449561381 UMAP: time: 6.313, size: 15064129, buckets: 31160981, sum: 165525449561381 T2: Insert sequential keys, then remove them in same order: KMAP: time: 1.173, size: 0, buckets: 33554432, erased 20000000 CMAP: time: 1.651, size: 0, buckets: 33218751, erased 20000000 PMAP: time: 3.840, size: 0, buckets: 33554431, erased 20000000 FMAP: time: 1.722, size: 0, buckets: 33554432, erased 20000000 RMAP: time: 2.359, size: 0, buckets: 33554431, erased 20000000 HMAP: time: 0.849, size: 0, buckets: 33554432, erased 20000000 TMAP: time: 0.660, size: 0, buckets: 33554432, erased 20000000 UMAP: time: 2.138, size: 0, buckets: 31160981, erased 20000000 T3: Remove random keys: KMAP: time: 1.973, size: 0, buckets: 33554432, erased 23367671 CMAP: time: 2.020, size: 0, buckets: 33218751, erased 23367671 PMAP: time: 2.940, size: 0, buckets: 33554431, erased 23367671 FMAP: time: 1.147, size: 0, buckets: 33554432, erased 23367671 RMAP: time: 1.941, size: 0, buckets: 33554431, erased 23367671 HMAP: time: 1.135, size: 0, buckets: 33554432, erased 23367671 TMAP: time: 1.064, size: 0, buckets: 33554432, erased 23367671 UMAP: time: 5.632, size: 0, buckets: 31160981, erased 23367671 T4: Iterate random keys: KMAP: time: 0.748, size: 23367671, buckets: 33554432, repeats: 8, sum: 4465059465719680 CMAP: time: 0.627, size: 23367671, buckets: 33218751, repeats: 8, sum: 4465059465719680 PMAP: time: 0.680, size: 23367671, buckets: 33554431, repeats: 8, sum: 4465059465719680 FMAP: time: 0.735, size: 23367671, buckets: 33554432, repeats: 8, sum: 4465059465719680 RMAP: time: 0.464, size: 23367671, buckets: 33554431, repeats: 8, sum: 4465059465719680 HMAP: time: 0.719, size: 23367671, buckets: 33554432, repeats: 8, sum: 4465059465719680 TMAP: time: 0.662, size: 23367671, buckets: 33554432, repeats: 8, sum: 4465059465719680 UMAP: time: 6.168, size: 23367671, buckets: 31160981, repeats: 8, sum: 4465059465719680 T5: Lookup random keys: KMAP: time: 0.943, size: 23367671, buckets: 33554432, lookups: 34235332, found: 29040438 CMAP: time: 0.863, size: 23367671, buckets: 33218751, lookups: 34235332, found: 29040438 PMAP: time: 1.635, size: 23367671, buckets: 33554431, lookups: 34235332, found: 29040438 FMAP: time: 0.969, size: 23367671, buckets: 33554432, lookups: 34235332, found: 29040438 RMAP: time: 1.705, size: 23367671, buckets: 33554431, lookups: 34235332, found: 29040438 HMAP: time: 0.712, size: 23367671, buckets: 33554432, lookups: 34235332, found: 29040438 TMAP: time: 0.584, size: 23367671, buckets: 33554432, lookups: 34235332, found: 29040438 UMAP: time: 1.974, size: 23367671, buckets: 31160981, lookups: 34235332, found: 29040438
-
Is A* just always slow?
std::unordered_map is notorious for being slow. Use a better implementation (I like the flat naps from here, which are the same as abseil’s). The question that needs to be asked too is if you need to use a map.
-
New Boost.Unordered containers have BIG improvements!
A comparison against phmap would also be nice.
-
How to implement static typing in a C++ bytecode VM?
std::unordered_map is perfectly fine. You can do better with external libraries, like parallel hashmap, but these tend to be drop-in replacements
1brc
-
The One Billion Row Challenge in CUDA: from 17 minutes to 17 seconds
This would be the code to beat. Ideally with only 8 cores but any number of cores is also very interesting.
https://github.com/gunnarmorling/1brc/discussions/710
-
One Billion Row Challenge in Golang - From 95s to 1.96s
Given that 1-billion-line-file is approximately 13GB, instead of providing a fixed database, the official repository offers a script to generate synthetic data with random readings. Just follow the instructions to create your own database.
-
1BRC Merykitty's Magic SWAR: 8 Lines of Code Explained in 3k Words
Local disk I/O is no longer the bottleneck on modern systems: https://benhoyt.com/writings/io-is-no-longer-the-bottleneck/
In addition, the official 1BRC explicitly evaluated results on a RAM disk to avoid I/O speed entirely: https://github.com/gunnarmorling/1brc?tab=readme-ov-file#eva... "Programs are run from a RAM disk (i.o. the IO overhead for loading the file from disk is not relevant)"
-
Processing One Billion Rows in PHP!
You may have heard of the "The One Billion Row Challenge" (1brc) and in case you don't, go checkout Gunnar Morlings's 1brc repo.
-
The One Billion Row Challenge in Go: from 1m45s to 4s in nine solutions
Here’s a thread on results with duckdb, I don’t mean to discourage you taking a shot at all though: https://github.com/gunnarmorling/1brc/discussions/39
-
Ask HN: How can I learn about performance optimization?
If you are in “javaland” look at billion row challenge, you will learn a lot - https://github.com/gunnarmorling/1brc
- Lessons Learned from Doing the One Billion Row Challenge
- 1B Row Challenge Shows Java Can Process 1B Rows File in 2 Seconds
-
From slow to SIMD: A Go optimization story
Even manual vectorization is pain...writing ASM, really?
Rust has unstable portable SIMD and a few third-party crates, C++ has that as well, C# has stable portable SIMD and a very small BLAS-like library on top of it (hell it even exercises PackedSIMD when ran in a browser) and Java is getting stable Panama vectors some time in the future (though the question of codegen quality stands open given planned changes to unsafe API).
Go among these is uniquely disadvantaged. And if that's not enough, you may want to visit 1Brc's challenge discussions and see that Go struggles get anywhere close to 2s mark with both C# and C++ are blazing past it:
https://hotforknowledge.com/2024/01/13/1brc-in-dotnet-among-...
https://github.com/gunnarmorling/1brc/discussions/67
-
JEP Draft: Deprecate Memory-Access Methods in Sun.misc.Unsafe for Removal
In terms of performance: I realize that this is a somewhat "toy" issue, and it's a sample size of 1, but for the currently ongoing "One Billion Row Challenge"[1] (an ongoing Java performance competition related to parsing and aggregating a 13 GB file), all of the current top-performers are using Unsafe. More specifically, the use of Unsafe appears to have been the change for a few entries that allowed getting below the 3-second barrier in the test.
1. https://github.com/gunnarmorling/1brc
What are some alternatives?
Folly - An open-source C++ library developed and used at Facebook.
1brc - C99 implementation of the 1 Billion Rows Challenge. 1️⃣🐝🏎️ Runs in ~1.6 seconds on my not-so-fast laptop CPU w/ 16GB RAM.
robin-hood-hashing - Fast & memory efficient hashtable based on robin hood hashing for C++11/14/17/20
yolov7-object-tracking - YOLOv7 Object Tracking Using PyTorch, OpenCV and Sort Tracking
libcuckoo - A high-performance, concurrent hash table
nodejs - 1️⃣🐝🏎️ The One Billion Row Challenge with Node.js -- A fun exploration of how quickly 1B rows from a text file can be aggregated with different languages.
rust-phf - Compile time static maps for Rust
csvlens - Command line csv viewer
flat_hash_map - A very fast hashtable
pocketbase - Open Source realtime backend in 1 file
tracy - Frame profiler
Apache Arrow - Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing