Multiplications and 2 additions are faster than 2 additions

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • blake2-avx2

    BLAKE2 AVX2 implementations

  • The part about data dependencies across loop iterations is fascinating to me, becuase it's mostly invisible even when you look at the generated assembly. There's a related optimization that comes up in implementations of ChaCha/BLAKE, where we permute columns around in a kind of weird order, because it breaks a data dependency for an operation that's about to happen: https://github.com/sneves/blake2-avx2/pull/4#issuecomment-50...

  • Halide

    a language for fast, portable data-parallel computation

  • I think it's worth pointing out that the reason why these two examples execute at different speed is due to how compiler translated code AND because CPU was able to parallelize work. Compilers take knowledge about target platform (e.g. instruction set) and code and translate it into executable code. Compiler CAN (but doesn't have to) rewrite code only if it ALWAYS produces the same result as input code.

    I feel like last 110-15 years (majority of) people have stopped thinking about specific CPU and only think about ISA. That works for a lot of workloads but in recent years I have observed that there is more and more interest in how specific CPU can execute code as efficiently as possible.

    If you're interested in the kind of optimizations performed in the example you should check out polyhedral compilation (https://polyhedral.info/) and halide (https://halide-lang.org/). Both can be used to speed up certain workloads significantly.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • Show HN: Flash Attention in ~100 lines of CUDA

    2 projects | news.ycombinator.com | 16 Mar 2024
  • Halide v17.0.0

    1 project | news.ycombinator.com | 1 Feb 2024
  • Implementing Mario's Stack Blur 15 times in C++ (with tests and benchmarks)

    1 project | news.ycombinator.com | 10 Nov 2023
  • Blog Post: Can You Trust a Compiler to Optimize Your Code?

    1 project | /r/rust | 9 Apr 2023
  • Halide – a language for fast, portable computation on images and tensors

    1 project | news.ycombinator.com | 16 Jan 2023