Our great sponsors
-
-
smhasher does not report any failure for blake3. The one failure for blake2b 256 is not ideal for a hash function, but not necessarily evidence that the function doesn't look like a random value: `Sparse` generated 50643 16-bit values, hashed them, and found 2 collisions in the high 32 bits of the output. I'm not sure what kind of flaw in the test harness you think can explain that.
There could definitely be issues in the integration code that lets the harness call into all these functions. For example, smhasher finds issues with SHA3 for the "PerlinNoise" input sets. That input set hashes small integers in [0, 4096), with seeds in [0, 4096); I'm not convinced the sha3 wrapper does anything useful with the seed here https://github.com/rurban/smhasher/blob/37cffd7b9cdaa2140c53... . I expect something similar is happening with SHA1 and SHA2.
The MD5 row shows no failure; only the function that truncates to the low 32 bit has failures.
You can read the test harness or the test log (e.g., https://github.com/rurban/smhasher/blob/master/doc/blake2b-2...) and apply your own significance threshold. The statistical tests are nothing special or novel (counting collisions in bitranges of the input, and bias in individual bits, mostly); the interesting part is how the various tests generate interesting sets of inputs. In the end, it's a bit like the PRNG wars: you can always come up with a test that makes a function look bad, but a ton of failure is definitely a bad sign.
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
The README for xxhash has benchmarks covering fast hashes including Meow:
https://github.com/Cyan4973/xxHash/wiki/Performance-comparis...
-
Google made faster Siphash variants, and also HighwayHash that's much faster.
-
umash (https://github.com/backtrace-labs/umash) has a similar structure PH block structure, but was designed for decent bit mixing (enough to satisfy smhasher, unlike CLHASH, which needs an additional finalizer) with a lower fixed time cost: 22 cycles for a one-byte hash.
I'm not sure how one would use that linear regression. What kind of hardware offers 675 GB/s of memory bandwidth? 140 bytes/cycle is easily more than twice the L2 read bandwidth offered by any COTS chip I'm aware of. There are also warm up effects past the fixed cost of setup and finalizers that slow down hashing for short input. For what range of input sizes (and hot/cold cache state) would you say the regression is a useful model?
-
This is great! Many applications (like dedupe) don't need full crypto guarantees.
If you need something fast, and crypto secure, I recommend checking out Blake3/b3sum. I'm just learning about XXH3 in this thread so I cannot comment on how it stacks up but I love b3sum for fast file hashing.
-
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-