-
countwords
Discontinued Playing with counting word frequencies (and performance) in various languages.
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
vowpal_wabbit
Vowpal Wabbit is a machine learning system which pushes the frontier of machine learning with techniques such as online, hashing, allreduce, reductions, learning2search, active, and interactive learning.
My favorite one is the "bonus" submission. It intentionally ignores the constraints of the benchmark and tries to be a bit more "correct" by using Unicode's word segmentation. The code is still almost as simple as the other "simple" variants and nearly as fast! https://github.com/benhoyt/countwords/blob/8553c8f600c40a4626e966bc7e7e804097e6e2f4/rust/bonus/main.rs
You only had to look at the code (https://github.com/coreutils/coreutils/blob/master/src/wc.c) to know whether or not that was really true.
You're likely correct, but I do recall attending a lecture by John Langford of https://vowpalwabbit.org/ running some form of an N-gram C++ based NLP model, including summary statistics on performance, in less time than wc -l took on the same data. Must have some neat hashing tricks, but still was cool
Related posts
-
Data Science terminology can be wild
-
Microsoft Reinforcement Learning Open Source Fest 2022 – Native CSV Parser
-
[Discussion] Support Vector Machines... in 2022
-
Solving problems by mapping them to other problems that we know how to solve
-
[Q] Is picking up a CS major worth it if it means having to take 5 STEM classes a semester for another two years?