Sorting with SIMD

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
  • zort

    Sorting algorithms in zig

  • Please do comment in situations like this.

    It's tough to find a good overview of this topic that includes recent work or that is aimed at empirical performance. That's part of why TFA is great!

    A version of MSB radix sort for sorting ints, doubles, or floats, or sorting structs by a single key of those types is here[0] and a version for sorting strings is here[1]. These are based on the MSB radix sort from this repository[2] which is associated with a technical report about parallel string sorts. I was only interested in sequential sorts, but this repository turned out to be a great resource anyway.

    [0]: https://github.com/alichraghi/zort/blob/main/src/radix.zig

    [1]: https://github.com/dendibakh/perf-challenge6/blob/Solution_R...

    [2]: https://github.com/bingmann/parallel-string-sorting

  • perf-challenge6

  • Please do comment in situations like this.

    It's tough to find a good overview of this topic that includes recent work or that is aimed at empirical performance. That's part of why TFA is great!

    A version of MSB radix sort for sorting ints, doubles, or floats, or sorting structs by a single key of those types is here[0] and a version for sorting strings is here[1]. These are based on the MSB radix sort from this repository[2] which is associated with a technical report about parallel string sorts. I was only interested in sequential sorts, but this repository turned out to be a great resource anyway.

    [0]: https://github.com/alichraghi/zort/blob/main/src/radix.zig

    [1]: https://github.com/dendibakh/perf-challenge6/blob/Solution_R...

    [2]: https://github.com/bingmann/parallel-string-sorting

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • parallel-string-sorting

    Collection of Parallel String Sorting Algorithms including Parallel Super Scalar String Sample Sort and Parallel Multiway LCP-Mergesort

  • Please do comment in situations like this.

    It's tough to find a good overview of this topic that includes recent work or that is aimed at empirical performance. That's part of why TFA is great!

    A version of MSB radix sort for sorting ints, doubles, or floats, or sorting structs by a single key of those types is here[0] and a version for sorting strings is here[1]. These are based on the MSB radix sort from this repository[2] which is associated with a technical report about parallel string sorts. I was only interested in sequential sorts, but this repository turned out to be a great resource anyway.

    [0]: https://github.com/alichraghi/zort/blob/main/src/radix.zig

    [1]: https://github.com/dendibakh/perf-challenge6/blob/Solution_R...

    [2]: https://github.com/bingmann/parallel-string-sorting

  • short-simd-sorter

  • I played a bit with finding a sequence of instructions to sort 8 floats in 2 SSE registers with a sorting network:

    https://github.com/0xf00ff00f/short-simd-sorter

    I pretty much brute-forced it, so I don't know if you could use the same idea to sort 16 values in 2 AVX registers. There's probably a better way.

  • fTetWild

    Fast Tetrahedral Meshing in the Wild

  • I recently tried to do that as well, but failed. Specifically, I have implemented AA sort [1] but for my use case the performance was about the same as std::sort in C++, for the substantial code complexity cost. I reverted to std::sort. The code is on github [2]

    Still, in that particular project the vectors being sorted are relatively small, typically under than 100kb, so I have only implemented their “inner” algorithm which works on a single CPU core. The complete AA sort algorithm was apparently designed for large vectors, and uses both SIMD and multithreading. Might be still useful for very long vectors.

    [1] https://ieeexplore.ieee.org/document/4336211

    [2] https://github.com/Const-me/fTetWild/blob/master/MeshRepair/...

  • zerovm-samples

    Sample code and libraries built for ZeroVM

  • I wonder why no one mentions bitonic sort? If you want to do anything in SIMD you better avoid branching as much as possible... and ideally altogether. Here is an implementation I co-authored some 10 years ago: https://github.com/zerovm/zerovm-samples/blob/master/disort/...

    Sorting-networks which were already mentioned seems similar but a bit too abstract.

    My code above doesn't contains values but those are easy to add I think. Of course it is better to permute fixed size pointers / offsets and not the entire blobs which can be of variable size and then it will complicate everything beyond feasible for SIMD

  • rune

    Rune is a programming language developed to test ideas for improving security and efficiency.

  • Maybe Google's new "Rune" language will become prevalent https://github.com/google/rune, which supports SoA.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • highway

    Performance-portable, length-agnostic SIMD with runtime dispatch

  • We also experimented with 4x4 networks but it's very helpful to use 256 or 512-bit SIMD. You might give our vqsort a try, it's quite a bit faster than std::sort: https://github.com/google/highway/blob/master/hwy/contrib/so...

  • avx_qsort

    Quick sort code using AVX2 instructions

  • The original (AFAICT) work on SIMD quick sort, also mentioned in the google post also implemented pointer sort by loading a pointed key using gather instructions and the method can be used for an array of structs. https://github.com/vkrasnov/avx_qsort/blob/master/qsort_AVX2...

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts