Performance comparison: counting words in Python, C/C++, Awk, Rust, and more

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

CodeRabbit: AI Code Reviews for Developers
Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR.
coderabbit.ai
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  1. countwords

    Discontinued Playing with counting word frequencies (and performance) in various languages.

    I don't think the performance is due to start up time at all. I actually cloned the repo, and ran the benchmark and found that Swift's execution time scales drastically with the size of the input.

    The benchmark tests each executable by piping in the full King James Bible duplicated 10 times[1] (each copy is 4.13 MB[2]). When I ran it using just a single copy of the input text, the execution time dropped to 58-59 milliseconds, but when I ran the benchmark without modifications it jumped up to over 4 seconds. A hello world script for comparison runs in about 13 milliseconds. The Swift team actually boasts about its quick start up time on the official website [3].

    [1] https://github.com/benhoyt/countwords/blob/master/test.sh#L5

    [2] https://github.com/benhoyt/countwords/blob/master/kjvbible.t...

    [3] https://www.swift.org/server/

  2. CodeRabbit

    CodeRabbit: AI Code Reviews for Developers. Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR.

    CodeRabbit logo
  3. CPython

    The Python programming language

    “Pure Python” commonly means implemented using only the python language. Something written in pure Python ought to be portable across Python implementations. I was merely pointing out that this line

    https://github.com/python/cpython/blob/4395ff1e6a18fb26c7a66...

    isn’t exactly pure python, because, under a different runtime (eg pypy), the code would take a different path (the “pure python” implementation of _count_elements instead of the C implementation).

  4. countwords

    Playing with counting word frequencies (and performance) in various languages. (by ClickHouse)

  5. countwords

    Playing with counting word frequencies (and performance) in various languages. (by kimono-koans)

    In case anyone is interested, I did an optimized, but much more simple, Rust implementation just today[0], which is faster than the optimized implementation on my machine. No indexing into arrays of bytes, etc., no "code golf" measures.

    Looks like idiomatic Rust, which I think is interesting. Shows there is more than one way to skin a cat.

    [0]: https://github.com/kimono-koans/countwords/blob/master/rust/...

  6. gccontent-benchmark

    Benchmarking different languages for a simple bioinformatics task (Counting the GC fraction of DNA in a FASTA file)

    Fun stuff! Has run a similar thing with a simple bioinformatics problem before (calculating the ratio of G and Cs against A+G+C+T):

    https://github.com/samuell/gccontent-benchmark#readme

    Really hard - or impossible - to arrive at a definitive single number for one language, but the whole exercise is a lot of fun and quite informative IMO :)

  7. robin-hood-hashing

    Discontinued Fast & memory efficient hashtable based on robin hood hashing for C++11/14/17/20

    Got a bit better C++ version here which uses a couple libraries instead of std:: stuff - https://gist.github.com/jcelerier/74dfd473bccec8f1bd5d78be5a... ; boost, fmt and https://github.com/martinus/robin-hood-hashing

        $ g++ -I robin-hood-hashing/src/include -O2 -flto -std=c++20 -fno-exceptions -fno-unwind-tables -fno-asynchronous-unwind-tables -lfmt

  8. countwords

    Playing with counting word frequencies (and performance) in various languages. (by BurntSushi)

    $ git clone -b ag/test-kimono https://github.com/BurntSushi/countwords

  9. SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • Performance comparison: counting words in Python, Go, C++, C, AWK, Forth, and Rust

    2 projects | /r/programming | 15 Mar 2021
  • CatBench Vector Search Playground on Postgres

    1 project | news.ycombinator.com | 2 Mar 2025
  • JavaScript Benchmarking Is a Mess

    8 projects | news.ycombinator.com | 24 Dec 2024
  • CodSpeed – integrated CI tool for performance testing

    4 projects | news.ycombinator.com | 31 Oct 2024
  • I've been loving Benchmarking lately, but the Framework does this one quirky thing with the first result of a set. Specifically, the first return is always unusually high.

    1 project | /r/laravel | 6 Dec 2023