Performance comparison: counting words in Python, Go, C++, C, AWK, Forth, and Rust

This page summarizes the projects mentioned and recommended in the original post on /r/rust

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • countwords

    Discontinued Playing with counting word frequencies (and performance) in various languages.

  • My favorite one is the "bonus" submission. It intentionally ignores the constraints of the benchmark and tries to be a bit more "correct" by using Unicode's word segmentation. The code is still almost as simple as the other "simple" variants and nearly as fast! https://github.com/benhoyt/countwords/blob/8553c8f600c40a4626e966bc7e7e804097e6e2f4/rust/bonus/main.rs

  • coreutils

    upstream mirror (by coreutils)

  • You only had to look at the code (https://github.com/coreutils/coreutils/blob/master/src/wc.c) to know whether or not that was really true.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • vowpal_wabbit

    Vowpal Wabbit is a machine learning system which pushes the frontier of machine learning with techniques such as online, hashing, allreduce, reductions, learning2search, active, and interactive learning.

  • You're likely correct, but I do recall attending a lecture by John Langford of https://vowpalwabbit.org/ running some form of an N-gram C++ based NLP model, including summary statistics on performance, in less time than wc -l took on the same data. Must have some neat hashing tricks, but still was cool

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • Data Science terminology can be wild

    1 project | /r/datascience | 16 Mar 2023
  • Microsoft Reinforcement Learning Open Source Fest 2022 – Native CSV Parser

    3 projects | dev.to | 13 Aug 2022
  • [Discussion] Support Vector Machines... in 2022

    1 project | /r/MachineLearning | 5 Apr 2022
  • Solving problems by mapping them to other problems that we know how to solve

    1 project | news.ycombinator.com | 17 Jan 2022
  • [Q] Is picking up a CS major worth it if it means having to take 5 STEM classes a semester for another two years?

    1 project | /r/statistics | 21 Dec 2021