Our great sponsors
-
countwords
Discontinued Playing with counting word frequencies (and performance) in various languages.
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
parallel-hashmap
A family of header-only, very fast and memory-friendly hashmap and btree containers.
Thanks -- good call. I definitely wasn't too focused on testing/hardening for this article. Someone submitted a PR to fix the no-NUL issue: https://github.com/benhoyt/countwords/pull/29
For those looking for a performant hash map in C++, absl::flat_hash_map is a good choice. It's very nearly a drop in replacement for std::unordered_map. I know there are some other goods one out there as well, this is just the one I use at work.
The use of unordered_map and the unnecessary copy stood out to me also. I changed the copy to a move, and replaced std::unordered_map with phmap::flat_hash_map(a drop in, header-only replacement, link). I got 1.56s for the original version, and 1.29s for the new version. Scaling the results so my 1.56s corresponds to his 1.75s, the new version would land at ~1.45s on his computer, about on par with the Rust results.
Another similar experiment. And has Java included :)
I'm using stdio instead of iostream, because why not. Further improvements possible with llfio.