Performance comparison: counting words in Python, C/C++, Awk, Rust, and more

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • countwords

    Discontinued Playing with counting word frequencies (and performance) in various languages.

  • I don't think the performance is due to start up time at all. I actually cloned the repo, and ran the benchmark and found that Swift's execution time scales drastically with the size of the input.

    The benchmark tests each executable by piping in the full King James Bible duplicated 10 times[1] (each copy is 4.13 MB[2]). When I ran it using just a single copy of the input text, the execution time dropped to 58-59 milliseconds, but when I ran the benchmark without modifications it jumped up to over 4 seconds. A hello world script for comparison runs in about 13 milliseconds. The Swift team actually boasts about its quick start up time on the official website [3].

    [1] https://github.com/benhoyt/countwords/blob/master/test.sh#L5

    [2] https://github.com/benhoyt/countwords/blob/master/kjvbible.t...

    [3] https://www.swift.org/server/

  • CPython

    The Python programming language

  • “Pure Python” commonly means implemented using only the python language. Something written in pure Python ought to be portable across Python implementations. I was merely pointing out that this line

    https://github.com/python/cpython/blob/4395ff1e6a18fb26c7a66...

    isn’t exactly pure python, because, under a different runtime (eg pypy), the code would take a different path (the “pure python” implementation of _count_elements instead of the C implementation).

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • countwords

    Playing with counting word frequencies (and performance) in various languages. (by ClickHouse)

  • countwords

    Playing with counting word frequencies (and performance) in various languages. (by kimono-koans)

  • In case anyone is interested, I did an optimized, but much more simple, Rust implementation just today[0], which is faster than the optimized implementation on my machine. No indexing into arrays of bytes, etc., no "code golf" measures.

    Looks like idiomatic Rust, which I think is interesting. Shows there is more than one way to skin a cat.

    [0]: https://github.com/kimono-koans/countwords/blob/master/rust/...

  • gccontent-benchmark

    Benchmarking different languages for a simple bioinformatics task (Counting the GC fraction of DNA in a FASTA file)

  • Fun stuff! Has run a similar thing with a simple bioinformatics problem before (calculating the ratio of G and Cs against A+G+C+T):

    https://github.com/samuell/gccontent-benchmark#readme

    Really hard - or impossible - to arrive at a definitive single number for one language, but the whole exercise is a lot of fun and quite informative IMO :)

  • robin-hood-hashing

    Discontinued Fast & memory efficient hashtable based on robin hood hashing for C++11/14/17/20

  • Got a bit better C++ version here which uses a couple libraries instead of std:: stuff - https://gist.github.com/jcelerier/74dfd473bccec8f1bd5d78be5a... ; boost, fmt and https://github.com/martinus/robin-hood-hashing

        $ g++ -I robin-hood-hashing/src/include -O2 -flto -std=c++20 -fno-exceptions -fno-unwind-tables -fno-asynchronous-unwind-tables -lfmt

  • countwords

    Playing with counting word frequencies (and performance) in various languages. (by BurntSushi)

  • $ git clone -b ag/test-kimono https://github.com/BurntSushi/countwords

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • Performance comparison: counting words in Python, Go, C++, C, AWK, Forth, and Rust

    2 projects | /r/programming | 15 Mar 2021
  • I've been loving Benchmarking lately, but the Framework does this one quirky thing with the first result of a set. Specifically, the first return is always unusually high.

    1 project | /r/laravel | 6 Dec 2023
  • Pinpoint performance regressions with CI-Integrated differential profiling

    4 projects | dev.to | 23 Oct 2023
  • If this isn't the perfect data structure, why?

    3 projects | /r/C_Programming | 22 Oct 2023
  • unordered_dense: A Fast & Densely Stored Hashmap And Hashset Based On Robin-Hood Backward Shift Deletion

    1 project | /r/programming | 11 Jul 2023