parallel-hashmap VS countwords

Compare parallel-hashmap vs countwords and see what are their differences.

countwords

Playing with counting word frequencies (and performance) in various languages. (by benhoyt)
Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
parallel-hashmap countwords
30 43
2,286 209
- -
7.6 5.9
13 days ago about 2 years ago
C++ Rust
Apache License 2.0 MIT License
The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

parallel-hashmap

Posts with mentions or reviews of parallel-hashmap. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2022-11-27.
  • My own Concurrent Hash Map picks
    2 projects | /r/cpp | 27 Nov 2022
    Cool! Looking forward to you trying my phmap - and please let me know if you have any question.
  • Boost 1.81 will have boost::unordered_flat_map...
    6 projects | /r/cpp | 31 Oct 2022
    I do this as well in my phmap and gtl implementations. It makes the tables look worse in benchmarks like the above, but prevents really bad surprises occasionally.
  • Comprehensive C++ Hashmap Benchmarks 2022
    3 projects | /r/cpp | 7 Sep 2022
    Thanks a lot for the great benchmark, Martin. Glad you used different hash functions, because I do sacrifice some speed to make sure that the performance of my hash maps doesn't degrade drastically with poor hash functions. Happy to see that my phmap and gtl (the C++20 version) performed well.
  • How to build a Chess Engine, an interactive guide
    5 projects | news.ycombinator.com | 2 Jul 2022
    Then they should really try https://github.com/greg7mdp/parallel-hashmap, the current state of the art.
  • boost::unordered map is a new king of data structures
    10 projects | /r/cpp | 30 Jun 2022
    Unordered hash map shootout CMAP = https://github.com/tylov/STC KMAP = https://github.com/attractivechaos/klib PMAP = https://github.com/greg7mdp/parallel-hashmap FMAP = https://github.com/skarupke/flat_hash_map RMAP = https://github.com/martinus/robin-hood-hashing HMAP = https://github.com/Tessil/hopscotch-map TMAP = https://github.com/Tessil/robin-map UMAP = std::unordered_map Usage: shootout [n-million=40 key-bits=25] Random keys are in range [0, 2^25). Seed = 1656617916: T1: Insert/update random keys: KMAP: time: 1.949, size: 15064129, buckets: 33554432, sum: 165525449561381 CMAP: time: 1.649, size: 15064129, buckets: 22145833, sum: 165525449561381 PMAP: time: 2.434, size: 15064129, buckets: 33554431, sum: 165525449561381 FMAP: time: 2.112, size: 15064129, buckets: 33554432, sum: 165525449561381 RMAP: time: 1.708, size: 15064129, buckets: 33554431, sum: 165525449561381 HMAP: time: 2.054, size: 15064129, buckets: 33554432, sum: 165525449561381 TMAP: time: 1.645, size: 15064129, buckets: 33554432, sum: 165525449561381 UMAP: time: 6.313, size: 15064129, buckets: 31160981, sum: 165525449561381 T2: Insert sequential keys, then remove them in same order: KMAP: time: 1.173, size: 0, buckets: 33554432, erased 20000000 CMAP: time: 1.651, size: 0, buckets: 33218751, erased 20000000 PMAP: time: 3.840, size: 0, buckets: 33554431, erased 20000000 FMAP: time: 1.722, size: 0, buckets: 33554432, erased 20000000 RMAP: time: 2.359, size: 0, buckets: 33554431, erased 20000000 HMAP: time: 0.849, size: 0, buckets: 33554432, erased 20000000 TMAP: time: 0.660, size: 0, buckets: 33554432, erased 20000000 UMAP: time: 2.138, size: 0, buckets: 31160981, erased 20000000 T3: Remove random keys: KMAP: time: 1.973, size: 0, buckets: 33554432, erased 23367671 CMAP: time: 2.020, size: 0, buckets: 33218751, erased 23367671 PMAP: time: 2.940, size: 0, buckets: 33554431, erased 23367671 FMAP: time: 1.147, size: 0, buckets: 33554432, erased 23367671 RMAP: time: 1.941, size: 0, buckets: 33554431, erased 23367671 HMAP: time: 1.135, size: 0, buckets: 33554432, erased 23367671 TMAP: time: 1.064, size: 0, buckets: 33554432, erased 23367671 UMAP: time: 5.632, size: 0, buckets: 31160981, erased 23367671 T4: Iterate random keys: KMAP: time: 0.748, size: 23367671, buckets: 33554432, repeats: 8, sum: 4465059465719680 CMAP: time: 0.627, size: 23367671, buckets: 33218751, repeats: 8, sum: 4465059465719680 PMAP: time: 0.680, size: 23367671, buckets: 33554431, repeats: 8, sum: 4465059465719680 FMAP: time: 0.735, size: 23367671, buckets: 33554432, repeats: 8, sum: 4465059465719680 RMAP: time: 0.464, size: 23367671, buckets: 33554431, repeats: 8, sum: 4465059465719680 HMAP: time: 0.719, size: 23367671, buckets: 33554432, repeats: 8, sum: 4465059465719680 TMAP: time: 0.662, size: 23367671, buckets: 33554432, repeats: 8, sum: 4465059465719680 UMAP: time: 6.168, size: 23367671, buckets: 31160981, repeats: 8, sum: 4465059465719680 T5: Lookup random keys: KMAP: time: 0.943, size: 23367671, buckets: 33554432, lookups: 34235332, found: 29040438 CMAP: time: 0.863, size: 23367671, buckets: 33218751, lookups: 34235332, found: 29040438 PMAP: time: 1.635, size: 23367671, buckets: 33554431, lookups: 34235332, found: 29040438 FMAP: time: 0.969, size: 23367671, buckets: 33554432, lookups: 34235332, found: 29040438 RMAP: time: 1.705, size: 23367671, buckets: 33554431, lookups: 34235332, found: 29040438 HMAP: time: 0.712, size: 23367671, buckets: 33554432, lookups: 34235332, found: 29040438 TMAP: time: 0.584, size: 23367671, buckets: 33554432, lookups: 34235332, found: 29040438 UMAP: time: 1.974, size: 23367671, buckets: 31160981, lookups: 34235332, found: 29040438
  • Is A* just always slow?
    3 projects | /r/gamedev | 26 Jun 2022
    std::unordered_map is notorious for being slow. Use a better implementation (I like the flat naps from here, which are the same as abseil’s). The question that needs to be asked too is if you need to use a map.
    3 projects | /r/gamedev | 26 Jun 2022
    std::unordered_map (std maps in general) is quite slow, as far as associative containers go. Try the flat_hash_map from here instead. Also, what are you using to implement the priority queue? Try using a priority heap or bucketed priority queue instead, it will likely outperform a normal priority queue. It may also be possible to optimise your open and closed lists. The book AI for Games talks about a variant the author calls “Node Array A*” which may be worth investigating.
  • New Boost.Unordered containers have BIG improvements!
    6 projects | /r/cpp | 13 Jun 2022
    A comparison against phmap would also be nice.
  • How to implement static typing in a C++ bytecode VM?
    2 projects | /r/ProgrammingLanguages | 8 Jun 2022
    std::unordered_map is perfectly fine. You can do better with external libraries, like parallel hashmap, but these tend to be drop-in replacements
  • I built PanakeDB, a 100% Rust event ingestion solution, and now it's available for free.
    3 projects | /r/rust | 11 Jan 2022
    If you need to build a concurrent hash map, this is probably the best way: https://greg7mdp.github.io/parallel-hashmap/ (use, say, 16 hashmap shards based on hash % 16). This way you can insert a new entry to the hashmap without blocking everything.

countwords

Posts with mentions or reviews of countwords. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2022-11-15.
  • How fast is really ASP.NET Core?
    4 projects | /r/programming | 15 Nov 2022
    "dang, I didn't know that was 50x faster than the idiomatic way" or "hey, I didn't know that this implementation in the stdlib prioritized this over that and made this so slow, that's interesting" -- .e.g, there's some kinda neat language details to be found in something like Ben Hoyt's community word count benchmarks repo and 'simple' vs 'optimal' code: https://github.com/benhoyt/countwords
  • Correct name for word matching problem
    2 projects | /r/algorithms | 13 Oct 2022
    It benchmarks programs that count the total number of unique words in some input. It's not exactly equivalent to your problem, but it's similarish. All of the programs used some kind of hash map for lookups, but I contributed a program that used a trie. Its performance in my experience varies depending on the CPU interestingly enough. On my old CPU (i7-6900K) it was a little slower, but on my new cpu (i9-12900KS) it was faster.
  • Performance comparison: counting words in Python, C/C++, Awk, Rust, and more
    12 projects | news.ycombinator.com | 24 Jul 2022
    Are you looking at the "simple" or the "optimized" versions? For the optimized, yes, the Go one is very similar to the C. For the simple, idiomatic version, the Go version [1] is much simpler than the C one [2]: 40 very straight-forward LoC vs 93 rather more complex ones including pointer arithmetic, tricky manual memory management, and so on.

    [1] https://github.com/benhoyt/countwords/blob/c66dd01d868aa83dc...

    12 projects | news.ycombinator.com | 24 Jul 2022
    I don't think the performance is due to start up time at all. I actually cloned the repo, and ran the benchmark and found that Swift's execution time scales drastically with the size of the input.

    The benchmark tests each executable by piping in the full King James Bible duplicated 10 times[1] (each copy is 4.13 MB[2]). When I ran it using just a single copy of the input text, the execution time dropped to 58-59 milliseconds, but when I ran the benchmark without modifications it jumped up to over 4 seconds. A hello world script for comparison runs in about 13 milliseconds. The Swift team actually boasts about its quick start up time on the official website [3].

    [1] https://github.com/benhoyt/countwords/blob/master/test.sh#L5

    [2] https://github.com/benhoyt/countwords/blob/master/kjvbible.t...

    [3] https://www.swift.org/server/

    12 projects | news.ycombinator.com | 24 Jul 2022
    Re: the Rust performance implementation, I was able to get ~25% better performance by rewriting the for loops as iterators and by using a buffered writer, which seems crazy put it's true.[0] I chalked it up to some crazy ILP/SIMD tricks the compiler is doing.

    I even submitted a PR, but Ben decided he was tired of maintaining and decided to archive the project (which fair enough!).

    [0]: https://github.com/benhoyt/countwords/pull/115

    12 projects | news.ycombinator.com | 24 Jul 2022
    Why not read the source code? :-)

    I wrote comments explaining things: https://github.com/benhoyt/countwords/blob/8553c8f600c40a462...

    12 projects | news.ycombinator.com | 24 Jul 2022
  • The difference between Go and Rust
    6 projects | /r/programming | 28 Sep 2021
    And yet Go was faster than Rust in a simple app that count words: https://benhoyt.com/writings/count-words/
  • How to Rapidly Improve at Any Programming Language
    8 projects | news.ycombinator.com | 18 Sep 2021
    > but the performance profiles & characteristics that we must know about in order to make a choice on which tool to use. And it shouldn't be that each user has to figure it out on their own, dig into PR's or whatever.

    That's an interesting take – I like the idea of a catalog of standard tasks with implementations in several languages as well as their performance characteristics. I suppose Rosetta Code gets the ball rolling with this, but it's missing some performance metrics. It reminds me of [Ben Hoyt's piece](https://benhoyt.com/writings/count-words/) on counting unique words in the KJV Bible in different languages.

  • Faster string keyed maps in Go
    2 projects | /r/golang | 22 Jul 2021
    This article shows that map lookups can be optimized by using the (unintuitive) pattern:

What are some alternatives?

When comparing parallel-hashmap and countwords you can also consider the following projects:

Folly - An open-source C++ library developed and used at Facebook.

libcuckoo - A high-performance, concurrent hash table

robin-hood-hashing - Fast & memory efficient hashtable based on robin hood hashing for C++11/14/17/20

rust-phf - Compile time static maps for Rust

flat_hash_map - A very fast hashtable

tracy - Frame profiler

FASTER - Fast persistent recoverable log and key-value store + cache, in C# and C++.

junction - Concurrent data structures in C++

CPython - The Python programming language

growt - This is a header only library offering a variety of dynamically growing concurrent hash tables. That all work by dynamically migrating the current table once it gets too full.

chromium - The official GitHub mirror of the Chromium source

eigen