10~17x faster than what? A performance analysis of Intel x86-SIMD-sort (AVX-512)

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

CodeRabbit: AI Code Reviews for Developers
Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR.
coderabbit.ai
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  1. sort-research-rs

    Test and benchmark suite for sort implementations.

    Ok so both the things you linked are not my ideas. The first one is AFAIK originally from the BlockQuicksort paper https://arxiv.org/pdf/1604.06697.pdf section 3.2. Which Orson Peters used in his pdqsort, which I know base ipnsort on. The second one is from Igor van den Hoven in his work on quadsort as I mention in the function description. Really a lot of this stuff is building on top of other peoples work and refining it. Seemingly I'm the first to write neat little ASCII graphics for them in the code making them easier to approach. There are some novel ideas by me in there too though, for example https://github.com/Voultapher/sort-research-rs/blob/42bf339d... the way I marry a limited number of sorting-networks, insertion sort and the bi-directional merge into one, very fast but binary size and compile time efficient package is novel. Generally I'd say my work has been more about figuring out novel ways to combine existing ideas into one good cohesive package fit for a standard library sort implementation.

  2. CodeRabbit

    CodeRabbit: AI Code Reviews for Developers. Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR.

    CodeRabbit logo
  3. fluxsort

    A fast branchless stable quicksort / mergesort hybrid that is highly adaptive.

    Steps to build a fast, highly adaptive AVX-512 sorting algorithm:

    - Clone fluxsort (https://github.com/scandum/fluxsort)

    - Replace the partitioning code in flux_default_partition and flux_reverse_partition with the obvious AVX-512 version using a compare and two compress instructions

    - If you're feeling ambitious, swap out the small array sorting, or incorporate crumsort's fulcrum partition for larger arrays.

    I know why I haven't done this: my computer doesn't have AVX-512, and hardly anyone else's seems to. Maybe a couple Zen 4 owners. I'm less clear on why the tech giants are reinventing the wheel to make these sorting alrogithms that don't even handle pre-sorted data rather than working with some of the very high-quality open source stuff out there. Is adaptivity really considered that worthless?

    Fluxsort makes this particularly simple because it gets great performance out of a stable out-of-place partition. It's a bit newer; maybe the authors weren't aware of this work. But these algorithms both use (fairly difficult) in-place partitioning code; why not slot that into the well-known pdqsort?

  4. Presentations

    Collection of personal presentations (by Voultapher)

    If you look at the history of the author's own sorting implementation (ipnsort), it started as an attempt to port fluxsort to Rust:

    https://github.com/Voultapher/Presentations/blob/master/rust...

    It starts there, eventually the author abandons the linked PR because of a better approach found with ipnsort:

    https://github.com/rust-lang/rust/pull/100856#issuecomment-1...

  5. rust

    Empowering everyone to build reliable and efficient software.

    If you look at the history of the author's own sorting implementation (ipnsort), it started as an attempt to port fluxsort to Rust:

    https://github.com/Voultapher/Presentations/blob/master/rust...

    It starts there, eventually the author abandons the linked PR because of a better approach found with ipnsort:

    https://github.com/rust-lang/rust/pull/100856#issuecomment-1...

  6. mountain-sort

    The best algorithm to sort mountains

    It depends on the size of the structs. For struct pointers you're likely better off sorting keys and pointers simultaneously. It doesn't matter much until you get to large sizes (millions), but sorting indices and then selecting with them is random access. If the original ordering is messy, selecting can be slower than the sorting step. For structs a few words long, the unit you're moving is a larger portion of a cache line, and I'd expect the data movement to drown out any SIMD advantage. A radix sort might be all right because it moves less, but I'd probably go with sorting indices as the first thing to try unless I knew the arrays were huge. For very large structs there's an interesting effort called mountain sort[0], "probably the best sorting algorithm if you need to sort actual mountains by height". Given that it minimizes number of moves it's ignoring cache entirely. I haven't benchmarked so I can't say much about how practical it is.

    [0] https://github.com/Morwenn/mountain-sort

  7. highway

    Performance-portable, length-agnostic SIMD with runtime dispatch

    The vqsort README says it is a non-issue on that generation CPU based on their benchmarks: https://github.com/google/highway/tree/master/hwy/contrib/so...

  8. quadsort

    Quadsort is a branchless stable adaptive mergesort faster than quicksort.

    https://github.com/scandum/quadsort/blob/f171a0b26cf6bd6f6dc...

    As you can see, quadsort 1.1.4.1 used 2 instead of 4 writes in the bi-directional parity merges. This was in June 2021, and would have compiled as branchless with clang, but as branched with gcc.

    When I added a compile time check to use ternary operations for clang I was not adapting your work. I was well aware that clang compiled ternary operations as branchless, but I wasn't aware that rust did as well. I added the compile time check to use ternary operations for a fair performance comparison against glidesort.

    https://raw.githubusercontent.com/scandum/fluxsort/main/imag...

    As for ipnsort's small sort, it is very similar to quadsort's small sort, which uses stable sorting networks, instead of unstable sorting networks. From my perspective it's not exactly novel. I didn't go for unstable sorting networks in crumsort to increase code reuse, and to not reduce adaptivity.

  9. SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  10. rotate

    A collection of array rotation algorithms.

    quadsort/fluxsort/crumsort author here.

    For me there's a strong visual component, perhaps most obvious for my work on array rotation algorithms.

    https://github.com/scandum/rotate

    There's also the ability to notice strange/curious/discordant things, and either connect the dots through trying semi-random things, as well as sudden insights which seem to be partially subconscious.

    One of my (many) theories is that I have the ability to use long-term memory in a quasi-similar manner to short-term memory for problem solving. My IQ is in the 120-130 range, I suffer from hypervigilance, so it's generally on the lower end due to lack of sleep.

    I'd say there's a strong creative aspect. If I could redo life I might try my hand at music.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • Fluxsort: A stable quicksort, now faster than Timsort for both random and ordered data

    1 project | /r/dataengineering | 11 Jul 2023
  • GitHub - scandum/fluxsort: A branchless stable quicksort / mergesort hybrid.

    1 project | /r/tech_article | 5 Feb 2023
  • Reinforcement learned branchless sorting functions for sort3, sort4 and sort5 were landed in LLVM

    3 projects | /r/rust | 4 Feb 2023
  • Need a Quick Favor: Help a Student with a Simple Click!

    1 project | /r/github | 19 Nov 2023
  • Doubt regarding counting sort

    2 projects | /r/algorithms | 12 Jun 2023

Did you know that C++ is
the 7th most popular programming language
based on number of references?