Sorting with SIMD

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • Onboard AI - Learn any GitHub repo in 59 seconds
  • SonarCloud - Analyze your C and C++ projects with just one click.
  • Revelo Payroll - Free Global Payroll designed for tech teams
  • InfluxDB - Collect and Analyze Billions of Data Points in Real Time
  • zort

    Sorting algorithms in zig

    Please do comment in situations like this.

    It's tough to find a good overview of this topic that includes recent work or that is aimed at empirical performance. That's part of why TFA is great!

    A version of MSB radix sort for sorting ints, doubles, or floats, or sorting structs by a single key of those types is here[0] and a version for sorting strings is here[1]. These are based on the MSB radix sort from this repository[2] which is associated with a technical report about parallel string sorts. I was only interested in sequential sorts, but this repository turned out to be a great resource anyway.

    [0]: https://github.com/alichraghi/zort/blob/main/src/radix.zig

    [1]: https://github.com/dendibakh/perf-challenge6/blob/Solution_R...

    [2]: https://github.com/bingmann/parallel-string-sorting

  • perf-challenge6

    Please do comment in situations like this.

    It's tough to find a good overview of this topic that includes recent work or that is aimed at empirical performance. That's part of why TFA is great!

    A version of MSB radix sort for sorting ints, doubles, or floats, or sorting structs by a single key of those types is here[0] and a version for sorting strings is here[1]. These are based on the MSB radix sort from this repository[2] which is associated with a technical report about parallel string sorts. I was only interested in sequential sorts, but this repository turned out to be a great resource anyway.

    [0]: https://github.com/alichraghi/zort/blob/main/src/radix.zig

    [1]: https://github.com/dendibakh/perf-challenge6/blob/Solution_R...

    [2]: https://github.com/bingmann/parallel-string-sorting

  • Onboard AI

    Learn any GitHub repo in 59 seconds. Onboard AI learns any GitHub repo in minutes and lets you chat with it to locate functionality, understand different parts, and generate new code. Use it for free at www.getonboard.dev.

  • parallel-string-sorting

    Collection of Parallel String Sorting Algorithms including Parallel Super Scalar String Sample Sort and Parallel Multiway LCP-Mergesort

    Please do comment in situations like this.

    It's tough to find a good overview of this topic that includes recent work or that is aimed at empirical performance. That's part of why TFA is great!

    A version of MSB radix sort for sorting ints, doubles, or floats, or sorting structs by a single key of those types is here[0] and a version for sorting strings is here[1]. These are based on the MSB radix sort from this repository[2] which is associated with a technical report about parallel string sorts. I was only interested in sequential sorts, but this repository turned out to be a great resource anyway.

    [0]: https://github.com/alichraghi/zort/blob/main/src/radix.zig

    [1]: https://github.com/dendibakh/perf-challenge6/blob/Solution_R...

    [2]: https://github.com/bingmann/parallel-string-sorting

  • short-simd-sorter

    I played a bit with finding a sequence of instructions to sort 8 floats in 2 SSE registers with a sorting network:

    https://github.com/0xf00ff00f/short-simd-sorter

    I pretty much brute-forced it, so I don't know if you could use the same idea to sort 16 values in 2 AVX registers. There's probably a better way.

  • fTetWild

    Fast Tetrahedral Meshing in the Wild

    I recently tried to do that as well, but failed. Specifically, I have implemented AA sort [1] but for my use case the performance was about the same as std::sort in C++, for the substantial code complexity cost. I reverted to std::sort. The code is on github [2]

    Still, in that particular project the vectors being sorted are relatively small, typically under than 100kb, so I have only implemented their “inner” algorithm which works on a single CPU core. The complete AA sort algorithm was apparently designed for large vectors, and uses both SIMD and multithreading. Might be still useful for very long vectors.

    [1] https://ieeexplore.ieee.org/document/4336211

    [2] https://github.com/Const-me/fTetWild/blob/master/MeshRepair/...

  • zerovm-samples

    Sample code and libraries built for ZeroVM

    I wonder why no one mentions bitonic sort? If you want to do anything in SIMD you better avoid branching as much as possible... and ideally altogether. Here is an implementation I co-authored some 10 years ago: https://github.com/zerovm/zerovm-samples/blob/master/disort/...

    Sorting-networks which were already mentioned seems similar but a bit too abstract.

    My code above doesn't contains values but those are easy to add I think. Of course it is better to permute fixed size pointers / offsets and not the entire blobs which can be of variable size and then it will complicate everything beyond feasible for SIMD

  • rune

    Rune is a programming language developed to test ideas for improving security and efficiency.

    Maybe Google's new "Rune" language will become prevalent https://github.com/google/rune, which supports SoA.

  • SonarCloud

    Analyze your C and C++ projects with just one click.. SonarCloud, a cloud-based static analysis tool for your CI/CD workflows, offers a one-click automatic analysis of C and C++ projects hosted on GitHub. Zero configuration and free for open-source projects! Analyze free.

  • highway

    Performance-portable, length-agnostic SIMD with runtime dispatch

    We also experimented with 4x4 networks but it's very helpful to use 256 or 512-bit SIMD. You might give our vqsort a try, it's quite a bit faster than std::sort: https://github.com/google/highway/blob/master/hwy/contrib/so...

  • avx_qsort

    Quick sort code using AVX2 instructions

    The original (AFAICT) work on SIMD quick sort, also mentioned in the google post also implemented pointer sort by loading a pointed key using gather instructions and the method can be used for an array of structs. https://github.com/vkrasnov/avx_qsort/blob/master/qsort_AVX2...

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts