Multiplications and 2 additions are faster than 2 additions

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

blake2-avx2

2 22 0.0 Objective-C

BLAKE2 AVX2 implementations

The part about data dependencies across loop iterations is fascinating to me, becuase it's mostly invisible even when you look at the generated assembly. There's a related optimization that comes up in implementations of ChaCha/BLAKE, where we permute columns around in a kind of weird order, because it breaks a data dependency for an operation that's about to happen: https://github.com/sneves/blake2-avx2/pull/4#issuecomment-50...

Halide

43 5,703 9.5 C++

a language for fast, portable data-parallel computation

I think it's worth pointing out that the reason why these two examples execute at different speed is due to how compiler translated code AND because CPU was able to parallelize work. Compilers take knowledge about target platform (e.g. instruction set) and code and translate it into executable code. Compiler CAN (but doesn't have to) rewrite code only if it ALWAYS produces the same result as input code.
I feel like last 110-15 years (majority of) people have stopped thinking about specific CPU and only think about ISA. That works for a lot of workloads but in recent years I have observed that there is more and more interest in how specific CPU can execute code as efficiently as possible.
If you're interested in the kind of optimizations performed in the example you should check out polyhedral compilation (https://polyhedral.info/) and halide (https://halide-lang.org/). Both can be used to speed up certain workloads significantly.

InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Show HN: Flash Attention in ~100 lines of CUDA

2 projects | news.ycombinator.com | 16 Mar 2024
Halide v17.0.0

1 project | news.ycombinator.com | 1 Feb 2024
Implementing Mario's Stack Blur 15 times in C++ (with tests and benchmarks)

1 project | news.ycombinator.com | 10 Nov 2023
Blog Post: Can You Trust a Compiler to Optimize Your Code?

1 project | /r/rust | 9 Apr 2023
Halide – a language for fast, portable computation on images and tensors

1 project | news.ycombinator.com | 16 Jan 2023

Multiplications and 2 additions are faster than 2 additions

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
halide hexagon Compiler DSL GPU
Post date: 29 May 2022

blake2-avx2

Halide

InfluxDB

Related posts

Show HN: Flash Attention in ~100 lines of CUDA

Halide v17.0.0

Implementing Mario's Stack Blur 15 times in C++ (with tests and benchmarks)

Blog Post: Can You Trust a Compiler to Optimize Your Code?

Halide – a language for fast, portable computation on images and tensors

Multiplications and 2 additions are faster than 2 additions

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com halide hexagon Compiler DSL GPU Post date: 29 May 2022

blake2-avx2

Halide

InfluxDB

Related posts

Show HN: Flash Attention in ~100 lines of CUDA

Halide v17.0.0

Implementing Mario's Stack Blur 15 times in C++ (with tests and benchmarks)

Blog Post: Can You Trust a Compiler to Optimize Your Code?

Halide – a language for fast, portable computation on images and tensors

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
halide hexagon Compiler DSL GPU
Post date: 29 May 2022