-
countwords
Discontinued Playing with counting word frequencies (and performance) in various languages.
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
countwords
Playing with counting word frequencies (and performance) in various languages. (by ClickHouse)
-
countwords
Playing with counting word frequencies (and performance) in various languages. (by kimono-koans)
-
gccontent-benchmark
Benchmarking different languages for a simple bioinformatics task (Counting the GC fraction of DNA in a FASTA file)
-
robin-hood-hashing
Discontinued Fast & memory efficient hashtable based on robin hood hashing for C++11/14/17/20
-
countwords
Playing with counting word frequencies (and performance) in various languages. (by BurntSushi)
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
I don't think the performance is due to start up time at all. I actually cloned the repo, and ran the benchmark and found that Swift's execution time scales drastically with the size of the input.
The benchmark tests each executable by piping in the full King James Bible duplicated 10 times[1] (each copy is 4.13 MB[2]). When I ran it using just a single copy of the input text, the execution time dropped to 58-59 milliseconds, but when I ran the benchmark without modifications it jumped up to over 4 seconds. A hello world script for comparison runs in about 13 milliseconds. The Swift team actually boasts about its quick start up time on the official website [3].
[1] https://github.com/benhoyt/countwords/blob/master/test.sh#L5
[2] https://github.com/benhoyt/countwords/blob/master/kjvbible.t...
[3] https://www.swift.org/server/
“Pure Python” commonly means implemented using only the python language. Something written in pure Python ought to be portable across Python implementations. I was merely pointing out that this line
https://github.com/python/cpython/blob/4395ff1e6a18fb26c7a66...
isn’t exactly pure python, because, under a different runtime (eg pypy), the code would take a different path (the “pure python” implementation of _count_elements instead of the C implementation).
In case anyone is interested, I did an optimized, but much more simple, Rust implementation just today[0], which is faster than the optimized implementation on my machine. No indexing into arrays of bytes, etc., no "code golf" measures.
Looks like idiomatic Rust, which I think is interesting. Shows there is more than one way to skin a cat.
[0]: https://github.com/kimono-koans/countwords/blob/master/rust/...
Fun stuff! Has run a similar thing with a simple bioinformatics problem before (calculating the ratio of G and Cs against A+G+C+T):
https://github.com/samuell/gccontent-benchmark#readme
Really hard - or impossible - to arrive at a definitive single number for one language, but the whole exercise is a lot of fun and quite informative IMO :)
Got a bit better C++ version here which uses a couple libraries instead of std:: stuff - https://gist.github.com/jcelerier/74dfd473bccec8f1bd5d78be5a... ; boost, fmt and https://github.com/martinus/robin-hood-hashing
$ g++ -I robin-hood-hashing/src/include -O2 -flto -std=c++20 -fno-exceptions -fno-unwind-tables -fno-asynchronous-unwind-tables -lfmt
$ git clone -b ag/test-kimono https://github.com/BurntSushi/countwords
Related posts
-
Performance comparison: counting words in Python, Go, C++, C, AWK, Forth, and Rust
-
I've been loving Benchmarking lately, but the Framework does this one quirky thing with the first result of a set. Specifically, the first return is always unusually high.
-
Pinpoint performance regressions with CI-Integrated differential profiling
-
If this isn't the perfect data structure, why?
-
unordered_dense: A Fast & Densely Stored Hashmap And Hashset Based On Robin-Hood Backward Shift Deletion