countwords
coreutils
Our great sponsors
countwords | coreutils | |
---|---|---|
43 | 112 | |
209 | 4,002 | |
- | 2.1% | |
5.9 | 9.3 | |
about 2 years ago | 10 days ago | |
Rust | C | |
MIT License | GNU General Public License v3.0 only |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
countwords
-
How fast is really ASP.NET Core?
"dang, I didn't know that was 50x faster than the idiomatic way" or "hey, I didn't know that this implementation in the stdlib prioritized this over that and made this so slow, that's interesting" -- .e.g, there's some kinda neat language details to be found in something like Ben Hoyt's community word count benchmarks repo and 'simple' vs 'optimal' code: https://github.com/benhoyt/countwords
-
Correct name for word matching problem
It benchmarks programs that count the total number of unique words in some input. It's not exactly equivalent to your problem, but it's similarish. All of the programs used some kind of hash map for lookups, but I contributed a program that used a trie. Its performance in my experience varies depending on the CPU interestingly enough. On my old CPU (i7-6900K) it was a little slower, but on my new cpu (i9-12900KS) it was faster.
-
Performance comparison: counting words in Python, C/C++, Awk, Rust, and more
Are you looking at the "simple" or the "optimized" versions? For the optimized, yes, the Go one is very similar to the C. For the simple, idiomatic version, the Go version [1] is much simpler than the C one [2]: 40 very straight-forward LoC vs 93 rather more complex ones including pointer arithmetic, tricky manual memory management, and so on.
[1] https://github.com/benhoyt/countwords/blob/c66dd01d868aa83dc...
I don't think the performance is due to start up time at all. I actually cloned the repo, and ran the benchmark and found that Swift's execution time scales drastically with the size of the input.
The benchmark tests each executable by piping in the full King James Bible duplicated 10 times[1] (each copy is 4.13 MB[2]). When I ran it using just a single copy of the input text, the execution time dropped to 58-59 milliseconds, but when I ran the benchmark without modifications it jumped up to over 4 seconds. A hello world script for comparison runs in about 13 milliseconds. The Swift team actually boasts about its quick start up time on the official website [3].
[1] https://github.com/benhoyt/countwords/blob/master/test.sh#L5
[2] https://github.com/benhoyt/countwords/blob/master/kjvbible.t...
Re: the Rust performance implementation, I was able to get ~25% better performance by rewriting the for loops as iterators and by using a buffered writer, which seems crazy put it's true.[0] I chalked it up to some crazy ILP/SIMD tricks the compiler is doing.
I even submitted a PR, but Ben decided he was tired of maintaining and decided to archive the project (which fair enough!).
Why not read the source code? :-)
I wrote comments explaining things: https://github.com/benhoyt/countwords/blob/8553c8f600c40a462...
-
The difference between Go and Rust
And yet Go was faster than Rust in a simple app that count words: https://benhoyt.com/writings/count-words/
-
How to Rapidly Improve at Any Programming Language
> but the performance profiles & characteristics that we must know about in order to make a choice on which tool to use. And it shouldn't be that each user has to figure it out on their own, dig into PR's or whatever.
That's an interesting take – I like the idea of a catalog of standard tasks with implementations in several languages as well as their performance characteristics. I suppose Rosetta Code gets the ball rolling with this, but it's missing some performance metrics. It reminds me of [Ben Hoyt's piece](https://benhoyt.com/writings/count-words/) on counting unique words in the KJV Bible in different languages.
-
Faster string keyed maps in Go
This article shows that map lookups can be optimized by using the (unintuitive) pattern:
coreutils
-
Show HN: Usr/bin/env Docker run
The -S / --split-string option[1] of /usr/bin/env is a relatively recent addition to GNU Coreutils. It's available starting from GNU Coreutils 8.30[2], released on 2018-07-01.
Beware of portability: it relies on a non-standard behavior from some operating systems. It only works for OS's that treat all the text after the first space as argument(s) to the shebanged executable; rather than just treating the whole string as an executable path (that can happen to contain spaces).
Fortunately this non-standard behavior is more the norm than the exception: it works at least on modern GNU/Linux, BSDs, and macOS.
[1] https://www.gnu.org/software/coreutils/manual/html_node/env-...
[2] https://github.com/coreutils/coreutils/blob/b09dc6306e7affaf...
-
From Nand to Tetris: Building a Modern Computer from First Principles
> building a cat from scratch
> That would be an interesting project.
Here is the source code of the OpenBSD implementation of cat:
> https://github.com/openbsd/src/blob/master/bin/cat/cat.c
and here of the GNU coreutils implementation:
> https://github.com/coreutils/coreutils/blob/master/src/cat.c
Thus: I don't think building a cat from scratch or creating a tutorial about that topic is particularly hard (even though the HN audience would likely be interested in it). :-)
-
The Linux Scheduler: A Decade of Wasted Cores (2016) [pdf]
the yes command, writing to /dev/null, is making IO calls, which interfere with predictable scheduling.
If you look at the source code for yes, https://github.com/coreutils/coreutils/blob/master/src/yes.c
it builds a buffer of output and then writes that in a for loop
while (full_write (STDOUT_FILENO, buf, bufused) == bufused)
-
Decoded: GNU Coreutils
even an empty file? Yes. so now it was a file with a copyright disclaimer and nothing else. And the koan-like question comes to mind is "Can you copyright nothing?" well AT&T sure tried.
Then somebody said our programs should be well defined and not depend on a fluke of unix, which at this point was probable a good idea. so it became "exit 0"
Then somebody said we should write our system utilities in C instead of shell so it runs faster. openbsd still has a good example of how this would look.
http://cvsweb.openbsd.org/cgi-bin/cvsweb/~checkout~/src/usr....
At some point gnu bureaucracy got involved and said all programs must support the '-h' flag. so that got added, then they said all programs must support locale so that got added. now days gnu true is an astonishing 80 lines long.
https://github.com/coreutils/coreutils/blob/master/src/true....
I do like that /bin/true can actually fail and return false, which technically makes a "Not /bin/false" invocation more resilient: https://github.com/coreutils/coreutils/blob/master/src/true.... (and yes, I know it's the most unlikely thing, I just found it funny)
-
Exa Is Deprecated
> Yes, ls is maintained. Although, maintained is a very strong word. It exists.
Why would it be a strong word? Here it is, in src/ls.c: https://github.com/coreutils/coreutils
It is then packaged by tens of operating system distributions, who themselves maintain extra patchsets, some of which are then upstreamed.
It is installed and used on millions (billions?) of devices, for 3 decades.
It's a very reliable and trusty "sharp stick of metal" :)
- stupid Linux tricks - cd one shell to the current dir of another, without using the clipboard, mouse, or even the pwd command
- What's the most efficient way to get the line count in a file?
- GitHub - dcantrell/bsdutils: Alternative to GNU coreutils using software from FreeBSD
-
GNOME’s horrid coding practices
GIMP dates to 1995 and didn't get any public releases until 1996. Your post makes it seem like GIMP is at least a decade older than it is. It isn't "one of the earliest GNU works". Many GNU projects predate it, but most of those aren't graphical so you might not be aware of them. Those older projects include clones of UNIX utilities, such as the ones in GNU Core Utilities or "coreutils" (although technically coreutils itself is a 2002 merger of a bunch of really old utils). For instance, cat from coreutils has a copyright date of 1988.
What are some alternatives?
util-linux
madaidans-insecurities
busybox - BusyBox mirror
src - Read-only git conversion of OpenBSD's official CVS src repository. Pull requests not accepted - send diffs to the tech@ mailing list.
linux - Linux kernel source tree
gnulib - upstream mirror
coreutils - Cross-platform Rust rewrite of the GNU coreutils
CPython - The Python programming language
freebsd-src - The FreeBSD src tree publish-only repository. Experimenting with 'simple' pull requests....
WSL - Issues found on WSL
llfio - P1031 low level file i/o and filesystem library for the C++ standard
comic-mono-font - A legible monospace font... the very typeface you’ve been trained to recognize since childhood