Use Fast Data Algorithms

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
  • makeself

    A self-extracting archiving tool for Unix systems, in 100% shell script.

  • Why not try a self-extracting archive?

    see https://makeself.io

  • BLAKE3

    the official Rust and C implementations of the BLAKE3 cryptographic hash function

  • > However, it must be kept in mind that BLAKE3 is much faster than any other cryptographic hash only because it distributes the computation on all CPU cores.

    Surprisingly, this is incorrect. The red bar chart above the fold at https://github.com/BLAKE3-team/BLAKE3 is a single-threaded measurement. What you see there is that BLAKE3 can take better advantage of SIMD parallelism than other hashes, and the C and Rust library implementations do this by default. Multithreading isn't enabled by default, but if you do use it (and you have enough input to feed it) the benefits are multiplicative.

    > only 1 cryptographic hash is faster: BLAKE3

    SIMD implementations of KangarooTwelve are also about as fast as BLAKE3, given enough input.

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • xxHash

    Extremely fast non-cryptographic hash algorithm

  • Agree with everything you say except that the post didn't mention non-cryptographic hashing algos that can be driven that hard. xxHash[1] (and especially XXH3) is almost always the fastest hashing choice, as it both is fast and has wide language support.

    Sure there are some other fast ones out there like cityhash[2] but there aren't good Java/Python bindings I'm aware of and I wouldn't recommend using it in production given the lack of wide-spread use versus xxhash which is used by LZ4 internally and in databases all over the place.

    [1] https://github.com/Cyan4973/xxHash

  • cityhash

    Automatically exported from code.google.com/p/cityhash

  • amazon-corretto-crypto-provider

    The Amazon Corretto Crypto Provider is a collection of high-performance cryptographic implementations exposed via standard JCA/JCE interfaces.

  • I don't fully agree for two reasons.

    First, I am not sure the data on most in-use hardware (e.g. EC2 m5/c5/i3en etc ...) supports your conclusions. xxHash is faster than crypto hashes always and BLAKE3 single threaded is faster on every Intel machine I've come across in wide deployment. I hear similar arguments around CRC-32 and to be frank it just isn't true on most computers most people run things on.

    Second, many languages don't properly use the hardware instructions and if they do they often don't use them correctly. For example, Java 8 has bog slow SHA-1, AES-GCM and MD5 implementations, and switching to Amazon Coretto Crypto Provider was able to speed SHA/MD5 up by 50% and AES-GCM by ~90% on a reasonably large deployment (although the JDK wasn't using proper hardware instructions for AES-GCM until Java 9 I think it is still slower even after that).

    That being said, like I disclaimed at the top of the benchmark your particular hardware and your particular language matters a lot.

    [1] https://github.com/corretto/amazon-corretto-crypto-provider/...

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts