Our great sponsors
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
https://github.com/BLAKE3-team/BLAKE3/blob/master/media/spee... argues BLAKE2s is twice as fast compared to SHA256.
One aspect switching from SHA1 to BLAKE2s does is it increases the total entropy a single compression operation adds to ChaCha20. This increases speed when folded BLAKE2s adds 128 bits per operation instead of folded SHA-1 that adds 80 bits. So that's two calls instead of four (I'm assuming they kept the folding). Another speedup comes from the fact the hash function constants aren't being filled with RDRAND inputs for every call.
Finally, I'm not completely sure if increasing the hash size itself adds computational security against an attack where the internal state is compromised once, and the attacker tries to brute force the new state based on new output; My conjecture is the reseeding operation is atomic, i.e. that ChaCha20 won't yield anything until the reseed is complete. There shouldn't thus be any difference in this regard. I'd appreciate clarification wrt this.
> That's for 16KiB inputs.
BLAKE3 needs 16 KiB of input to hit the numbers in that bar chart, but BLAKE2s doesn't. It'll maintain its advantage over SHA-256 all the way down to the empty string. You can see this in Figure 3 of https://github.com/BLAKE3-team/BLAKE3-specs/blob/master/blak.... (BLAKE3 is also faster than SHA-256 all the way down to the empty string, but not by as large a margin as the 16 KiB figures suggest.)
On the other hand, these measurements were done on machines without SHA-256 hardware acceleration. If you have that (and Intel chips from the past year do), then SHA-256 does a lot better of course.
With parameters as specified by SHA3 it's a lot slower than BLAKE3
Keccak (SHA-3) is actually a good deal faster than BLAKE(1) in hardware. That’s the reason why they chose it: It has acceptable performance in software, and very good performance in hardware.
KangarooTwelve / MarsupilamiFourteen are Keccak variants with fewer rounds; they should smoke BLAKE2 and probably even BLAKE3 in dedicated hardware. Also, they have tree hashing modes of operation like the later BLAKE developers.
The BLAKE family is best in situations where you want the best possible software performance; indeed, there are cases where you do not want hardware to outperform software (e.g. key derivation functions) where some Salsa20/ChaCha20/BLAKE variant makes the most sense. The Keccak family is when one already has dedicated hardware instructions (e.g. ARM already has a hardware level Keccak engine; Intel is dragging their feet but it is only a matter of time) or is willing to trade software performance for more hardware performance.
Keccak code is here: https://github.com/XKCP/XKCP