cligen
smhasher
cligen | smhasher | |
---|---|---|
32 | 30 | |
489 | 1,695 | |
- | - | |
8.4 | 7.1 | |
26 days ago | 2 months ago | |
Nim | C++ | |
ISC License | GNU General Public License v3.0 or later |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
cligen
-
CLI user experience case study
There is also generating the whole thing from a function signature (e.g. https://github.com/c-blake/cligen ) since then CLauthors need not learn a new spec language, but then CLauthors must add back in helpful usage metadata/semantics and still need to learn a library API (but I like how those two things can be "gradual"). It's a hard space in which to find perfection, but I wish you luck in your attempt!
-
Things I've learned about building CLI tools in Python
cligen also allows End-CL-users to adjust colorization of --help output like https://github.com/c-blake/cligen/blob/master/screenshots/di... using something like https://github.com/c-blake/cligen/wiki/Dark-BG-Config-File
Last I knew, the argparse backing most Py CLI solutions did not support such easier (for many) to read help text, but the PyUniverse is too vast to be sure without much related work searching.
-
Removing Garbage Collection from the Rust Language (2013)
20 milliseconds? On my 7 year old Linux box, this little Nim program https://github.com/c-blake/bu/blob/main/wsz.nim runs to completion in 275 microseconds when fully statically linked with musl libc on Linux. That's with a stripped environment (with `env -i`). It takes more like 318 microseconds with my usual 54 environment variables. The program only does about 17 system calls, though.
Additionally, https://github.com/c-blake/cligen makes decent CLI tools a real breeze. If you like some of Go's qualities but the language seems too limited, you might like Nim: https://nim-lang.org. I generally find getting good performance much less of a challenge with Nim, but Nim is undeniably less well known with a smaller ecosystem and less corporate backing.
-
Writing Small CLI Programs in Common Lisp (2021)
If you find this article interesting and are curious about Nim then you would probably also be curious about https://github.com/c-blake/cligen
That allows adding just 1-line to a module to add a pretty complete CLI and then a string per parameter to properly document options (assuming an existing API using keyword arguments).
It's also not hard to compile & link a static ELF binary with Nim.. I do it with MUSL libc on Linux all the time. I just toss into my ~/.config/nim/nim.cfg:
@if musl: # make nim c -d:musl .. foo static-link `foo` with musl
-
GNU Parallel, where have you been all my life?
Sure. No problem.
Even Windows has popen these days. There are some tiny popenr/popenw wrappers in https://github.com/c-blake/cligen/blob/master/cligen/osUt.ni...
Depending upon how balanced work is on either side of the pipe, you usually can even get parallel speed-up on multicore with almost no work. For example, there is no need to use quote-escaped CSV parsing libraries when you just read from a popen()d translator program producing an easier format: https://github.com/c-blake/nio/blob/main/utils/c2tsv.nim
-
The Bipolar Lisp Programmer
Nim is terse yet general and can be made even more so with effort. E.g., You can gin up a little framework that is even more terse than awk yet statically typed and trivially convertible to run much faster like https://github.com/c-blake/bu/blob/main/doc/rp.md
You can statically introspect code to then generate related/translated ASTs to create nearly frictionless helper facilities like https://github.com/c-blake/cligen .
You can do all of this without any real run-time speed sacrifices, depending upon the level of effort you put in / your expertise. Since it generates C/C++ or Javascript you get all the abilities of backend compilers almost out of the box, like profile-guided-optimization or for JS JIT compilation.
-
Ask HN: Why did Nim not catch-on like wild fire as Rust did?
It's more that those tools were what come to mind when I specifically think of my exposure to the existence of rust. Its perhaps not that the tools were there, but that they were well known (and known for being written in rust).
Anecdatapoint - I've never heard of literally a single one of the utilities listed on the bu page.
Regarding cligen, right from the start clap wins on producing idiomatic output. Compare: https://github.com/c-blake/cligen#cligen-a-native-api-inferr...
Usage:
-
Newbie looking at nim
cool example would be this which is a CLI generation library. It lets you describe command line interfacs simply using function signatures
-
Zig and Rust
>Does nim have anything as polished and performant as clap and serde?
"Polished" and "high quality" are more subjective/implicitly about adoption, IMO. "Performant" has many dimensions. I just tested the Nim https://github.com/c-blake/cligen vs clap: cligen used 5X less object file space (with all size optimization tweaks enabled in both), 20% less run-time memory for large argument lists, and the same run-time per argument (with march=native equivalents on both, within statistical noise). cligen has many features - "did you mean?/suggestions", color generated help and all that - I do not see obvious feature in clap docs missing in cligen. The Nim binary serde showing is unlikely as good but there are like 10 JSON packages and that seems maybe your primary concern.
More to add color your point than disagree (and follow up on my "adoption") - your ideas about polish, quality, docs, etc. are part of feedback loop(s) you mentioned. More users => Users complain (What is confusing? What is missing? etc.) => things get fixed/cleaned up/improved => More users. Besides "performant" being multi-dimensional, the feedback loop is more of a "cyclic graph". :-) While I probably prefer Nim as much or more as @netbioserror, I am not too shocked by the mindshare capture. It seems to happen every 5..10 years or so in prog.langs.
While many of your points are not invalid, tech is also a highly hype-driven & fad-driven realm. In my experience, the more experience with this meta-feature that someone has, the more skeptical they are of the latest thing (more rounds of regret, etc.). Also, that feedback graph is not a pure good. Things can get too popular too quickly with near permanent consequences. ipv4 got popular so quickly that we are still mostly stuck on it 40 years later as ipv6 struggles for penetration. Whatever your favorite PL is, it may also grow features too fast.
-
Self Hosted SaaS Alternatives
You are welcome. Thanks are too rarely offered. :-)
You may also be interested in word stemming ( such as used by snowball stemmer in https://github.com/c-blake/nimsearch ) or other NLP techniques, but I don't know how internationalized/multi-lingual that stuff is, but conceptually you might want "series of stemmed words" to be the content fragments of interest.
Similarity scores have many applications. Weights on graph of cancelled downloads ranked by size might be one. :)
Of course, for your specific "truncation" problem, you might also be able to just do an edit distance against the much smaller filenames and compare data prefixes in files or use a SHA256 of a content-based first slice. ( There are edit distance algos in Nim in https://github.com/c-blake/cligen/blob/master/cligen/textUt.... as well as in https://github.com/c-blake/suggest ).
Or, you could do a little program like ndup/sh/ndup to create a "mirrored file tree" of such content-based slices then you could use any true duplicate-file finder (like https://github.com/c-blake/bu/blob/main/dups.nim) on the little signature system to identify duplicates and go from path suffixes in those clusters back to the main filesystem. Of course, a single KV store within one or two files would be more efficient than thousands of tiny files. There are many possibilities.
smhasher
-
GxHash - A new (extremely) fast and robust hashing algorithm 🚀
The algorithm passes all SMHasher quality tests and uses rounds of AES block cipher internally, so it is quite robust! For comparison XxH3, t1ha0 and many others don't pass SMHasher (while being slower).
-
The PolymurHash universal hash function
Confirmed, I tested it. https://github.com/rurban/smhasher
-
Show HN: Discohash – simply, quality, fast hash
There's lots of great hash functions out there: some are super fast, like xxhash and highly optimized, others are also super fast umash and based on interesting math ideas from finite fields^1, while maintaining high quality (according to SMHasher). Others are also fast and interesting (tabulation hash, that may sometimes be seemingly universal), one of the main originators of those ideas are Mikkel Thorup^2. Anyway, a couple of years ago I also tried my hand at building hashes and created a few that passed SMHasher (tifuhash ~ a floating point hash, beamsplitter - a seemingly-universal tabulation style hash, and this one discohash - a "more traditional" ARX-based design (addition rotation xor)^3 ).
0: https://github.com/rurban/smhasher/blob/master/xxh3.h
1: https://pvk.ca/Blog/2022/12/29/fixing-hashing-modulo-alpha-e...
2: https://arxiv.org/abs/1505.01523
3: https://eprint.iacr.org/2018/898.pdf https://crypto.polito.it/content/download/480/2850/file/docu...
4: https://en.wikipedia.org/wiki/BLAKE_(hash_function)
Discohash (posted here) is the fastest one I made, it's simple and doesn't rely on any arch-specific optimizations or vector instructions (AVX etc ~ tho I suppose...they could be added? I'm definitely no expert in them, I barely get away with doing the C/C++ implementations!)
The main mixing round function is:
mix(const int A) {
-
A Vulnerability in Implementations of SHA-3, Shake, EdDSA
ubsan, asan, valgrind tests are missing. some do offer symbolic verification of the algo, but not the implementations.
See my https://github.com/rurban/smhasher#crypto paragraph, and
-
Academic Urban Legends
The spinach story reminds me a lot on the false recommendation of siphash for hash table DDOS prevention. https://github.com/rurban/smhasher#security
The authors came up in their widely cited paper with a proper solution to spread the random hash seed into the inner loop, vastly enhancing its security by avoiding trivial hash collision attacks. But a secure, slow hash function can never prevent from normal hash seed attacks, when the random seed is known somehow. esp. with dynamic languages it's trivial to get the seed externally.
Other trivial countermeasures must be used then, which also don't make hash tables 10x slower, keeping them practical.
- SHA-1 is out. NIST recommends switching to the SHA-2 and SHA-3 groups of hash algorithms as soon as possible, with an official deadline of Dec. 31, 2030.
- Adventures in Advent of Code
-
New ScyllaDB Go Driver: Faster Than GoCQL and Its Rust Counterpart
This is the best, most comprehensive hash test suite I know of: https://github.com/rurban/smhasher/
you might want to particularly look into murmur, spooky, and metrohash. I'm not exactly sure of what the tradeoffs involved are, or what your need is, but that site should serve as a good starting point at least.
-
What do you typically use for non-cryptographic hash functions?
Here is a good comparison table, as you can see, BLAKE can perform in secure way much faster than crc32, so my original point, - to use non weak hashes unless you really have a reason/requirement not to do so
-
What hash function you use for hash maps / hash tables?
smhasher is a great place to testing results for a massive number of hash algorithms.
What are some alternatives?
httpbeast - A highly performant, multi-threaded HTTP 1.1 server written in Nim.
xxHash - Extremely fast non-cryptographic hash algorithm
bioawk - BWK awk modified for biological data
wyhash - The FASTEST QUALITY hash function, random number generators (PRNG) and hash map.
nimforum - Lightweight alternative to Discourse written in Nim
BLAKE3 - the official Rust and C implementations of the BLAKE3 cryptographic hash function
loggedfs - LoggedFS - Filesystem monitoring with Fuse
Hashids.java - Hashids algorithm v1.0.0 implementation in Java
lobster - The Lobster Programming Language
png-decoder - A pure-Rust, no_std compatible PNG decoder
walkdir - Rust library for walking directories recursively.
rustls - A modern TLS library in Rust