Our great sponsors
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
sagarm has posted one result in another thread. I'll also look into adding their code to our benchmark :)
It's great to see more vector code, but caveat for anyone using this: the pivot sampling is quite basic, just median of 16 evenly spaced samples. This is will perform poorly on skewed distributions including all-equal and very few unique values. Yes, in the worst case it can resort to std::sort but that's a >10x speed hit and until recently also potentially O(N^2)!.
We have drawn larger samples (nine vectors, not one), and subsequently extended the vqsort algorithm beyond what is described in our paper, e.g. special handling for 1..3 unique keys, see https://github.com/google/highway/blob/master/hwy/contrib/so....
https://github.com/funrollloops/parallel-sort-bench
Related posts
- Llamafile 0.7 Brings AVX-512 Support: 10x Faster Prompt Eval Times for AMD Zen 4
- Permuting Bits with GF2P8AFFINEQB
- AMD EPYC 97x4 “Bergamo” CPUs: 128 Zen 4c CPU Cores for Servers, Shipping Now
- 10~17x faster than what? A performance analysis of Intel' x86-SIMD-sort(AVX-512)
- The Most Useful Numbers You've Never Heard Of (Veritasium video on p-adic numbers)