quadsort
fluxsort
quadsort | fluxsort | |
---|---|---|
9 | 12 | |
2,148 | 703 | |
0.2% | 0.7% | |
3.3 | 5.1 | |
8 months ago | 8 months ago | |
C | C | |
The Unlicense | The Unlicense |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
quadsort
-
10~17x faster than what? A performance analysis of Intel x86-SIMD-sort (AVX-512)
https://github.com/scandum/quadsort/blob/f171a0b26cf6bd6f6dc...
As you can see, quadsort 1.1.4.1 used 2 instead of 4 writes in the bi-directional parity merges. This was in June 2021, and would have compiled as branchless with clang, but as branched with gcc.
When I added a compile time check to use ternary operations for clang I was not adapting your work. I was well aware that clang compiled ternary operations as branchless, but I wasn't aware that rust did as well. I added the compile time check to use ternary operations for a fair performance comparison against glidesort.
https://raw.githubusercontent.com/scandum/fluxsort/main/imag...
As for ipnsort's small sort, it is very similar to quadsort's small sort, which uses stable sorting networks, instead of unstable sorting networks. From my perspective it's not exactly novel. I didn't go for unstable sorting networks in crumsort to increase code reuse, and to not reduce adaptivity.
-
Show HN: QuadSort, Esoteric Fast Sort
In the code it looks like the seed to the benchmark can be provided as the 4th command line argument: https://github.com/scandum/quadsort/blob/master/src/bench.c#...
-
When does big-oh notation become not helpful when comparing algorithms?
If you look at sorting for example, it's been proven that you can't do a comparison-based sort faster than O(n logn). You may then think that we've already found the fastest possible sorting algorithms since Quicksort and Mergesort are already O(n logn). However, new sorting algorithms keep being invented, for example Quadsort. They're all still O(n logn), but they do offer a considerable speed improvement over more traditional algorithms
- quadsort 1.1.5.1: Up to 2.5x faster than qsort() on random data
- Quadsort 1.1.5.1: Introducing cost effective branchless merging
- I tried creating a sorting algorithm in C language.
fluxsort
- Fluxsort: A stable quicksort, now faster than Timsort for both random and ordered data
-
10~17x faster than what? A performance analysis of Intel x86-SIMD-sort (AVX-512)
Steps to build a fast, highly adaptive AVX-512 sorting algorithm:
- Clone fluxsort (https://github.com/scandum/fluxsort)
- Replace the partitioning code in flux_default_partition and flux_reverse_partition with the obvious AVX-512 version using a compare and two compress instructions
- If you're feeling ambitious, swap out the small array sorting, or incorporate crumsort's fulcrum partition for larger arrays.
I know why I haven't done this: my computer doesn't have AVX-512, and hardly anyone else's seems to. Maybe a couple Zen 4 owners. I'm less clear on why the tech giants are reinventing the wheel to make these sorting alrogithms that don't even handle pre-sorted data rather than working with some of the very high-quality open source stuff out there. Is adaptivity really considered that worthless?
Fluxsort makes this particularly simple because it gets great performance out of a stable out-of-place partition. It's a bit newer; maybe the authors weren't aware of this work. But these algorithms both use (fairly difficult) in-place partitioning code; why not slot that into the well-known pdqsort?
- A Rust port of crumsort, up to 75% faster than pdqsort
- GitHub - scandum/fluxsort: A branchless stable quicksort / mergesort hybrid.
-
Reinforcement learned branchless sorting functions for sort3, sort4 and sort5 were landed in LLVM
With the right code and code-gen https://github.com/scandum/fluxsort/issues/5 these can be excellent at exploiting wide super-scalar architectures. I use them in both ipn_stable and to a larger extent in ipn_unstable which even uses the lesser known median networks for pivot selection. That said, I've done a lot of experiments with smaller sorting networks, sort3/4/5 as used in libcxx. And I found that they only look good in synthetic micro-benchmarks. If all your program does, is sort inputs of one specific size 3/4/5 in a hot loop and does nothing else, yes they are faster than insertion sort. But as soon as your application does some non-trivial amount of other work before calling sort again, the code complexity and additional branching required to get to that sort network is not worth it anymore. Depending on your architecture, my findings suggest they only start pulling ahead beginning at sizes 8-12.
-
Changing std:sort at Google’s Scale and Beyond
Any chance you could comment on fluxsort[0], another fast quicksort? It's stable and uses a buffer about the size of the original array, which sounds like it puts it in a similar category as glidesort. Benchmarks against pdqsort at the end of that README; I can verify that it's faster on random data by 30% or so, and the stable partitioning should mean it's at least as adaptive (but the current implementation uses an initial analysis pass followed by adaptive mergesort rather than optimistic insertion sort to deal with nearly-sorted data, which IMO is fragile). There's an in-place effort called crumsort along similar lines, but it's not stable.
I've been doing a lot of work on sorting[2], in particular working to hybridize various approaches better. Very much looking forward to seeing how glidesort works.
[0] https://github.com/scandum/fluxsort
[1] https://github.com/scandum/crumsort
[2] https://mlochbaum.github.io/BQN/implementation/primitive/sor...
- I tried creating a sorting algorithm in C language.
- Fluxsort: A stable adaptive partitioning comparison sort
-
Hacker News top posts: Jul 25, 2021
Fluxsort: A stable adaptive partitioning comparison sort\ (0 comments)
What are some alternatives?
blitsort - Blitsort is an in-place stable adaptive rotate mergesort / quicksort.
pdqsort - Pattern-defeating quicksort.
HSL - HSL to RGB and RGB to HSL
crumsort - A branchless unstable quicksort / mergesort that is highly adaptive.
awesome-algorithms - A curated list of awesome places to learn and/or practice algorithms.