rotate
sort-research-rs
rotate | sort-research-rs | |
---|---|---|
4 | 47 | |
143 | 292 | |
- | - | |
10.0 | 9.0 | |
over 1 year ago | 11 days ago | |
C | Rust | |
GNU General Public License v3.0 or later | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
rotate
-
10~17x faster than what? A performance analysis of Intel x86-SIMD-sort (AVX-512)
quadsort/fluxsort/crumsort author here.
For me there's a strong visual component, perhaps most obvious for my work on array rotation algorithms.
https://github.com/scandum/rotate
There's also the ability to notice strange/curious/discordant things, and either connect the dots through trying semi-random things, as well as sudden insights which seem to be partially subconscious.
One of my (many) theories is that I have the ability to use long-term memory in a quasi-similar manner to short-term memory for problem solving. My IQ is in the 120-130 range, I suffer from hypervigilance, so it's generally on the lower end due to lack of sleep.
I'd say there's a strong creative aspect. If I could redo life I might try my hand at music.
-
Is there a more efficient way to write this C program?
This is essentially just a rotation of a subrange of your original array. A variety of different algorithms for this operation can be found here.
-
Building the Perfect Memory Bandwidth Beast
Memory bandwidth is 1000x lower than CPU bandwidth, so as a rule of thumb any algorithm whose work scales linearly in the amount of data being processed will be memory bandwidth bound, and also any algorithm which can't be structured to do a lot of work on one memory region at once before moving onto the next one.
Examples (for large enough inputs that it's relevant) include shuffling, sorting, kmeans clustering, branch and bound sudoku solving, vector addition, dot products, and so on.
Moreover, writing a particular piece of code is often easier if you ignore memory bandwidth as a constraint. The classic example is matrix multiplication -- it can be structured such that even disk bandwidth isn't relevant compared to CPU bandwidth, but doing so is a little fiddly compared to the naive n^2 dot products approach, so writing it yourself usually results in a memory bandwidth bound solution for large matrices.
Similarly, writing two passes over your data rather than doing a mega-loop, the choice to use classic kmeans rather than one of its approximations (when it would be appropriate to do so), or not enforcing sortedness at some reasonable boundary and having to do additional passes over your data. It's easy to write code that hoovers up way more bandwidth than it needs to, and often faster algorithms that come out don't do anything different than access the right data at the right time to reduce that pressure, like a trinity rotation [0].
Caveat: Benchmark everything, especially as you're building intuition. Trying to fix what you think is a memory bandwidth issue can result in pipeline stalls and all sorts of fun things, especially when your server has more faster caches than your dev machine, when data in prod doesn't match your micro benchmark, ....
[0] https://github.com/scandum/rotate
- A collection of array rotation algorithms
sort-research-rs
-
The Rust Calling Convention We Deserve
If you want a particularity cursed example, I've recently called Go code from Rust via C in the middle, including passing a Rust closure with state into the Go code as callback into a Go stdlib function, including panic unwinding from inside the Rust closure https://github.com/Voultapher/sort-research-rs/commit/df6c91....
- Driftsort: An efficient, generic and robust stable sort implementation
-
Out-of-bounds read and write in the glibc's qsort()
See also https://github.com/Voultapher/sort-research-rs/blob/main/wri.... Discussion at https://news.ycombinator.com/item?id=37781612
- Fast, small, robust: pick three. Introducing a novel branchless partition impl
- A performance analysis of Intel's x86-simd-sort
- sort-research-rs/writeup/intel_avx512/text.md at main ยท Voultapher/sort-research-rs
- Fast, small, robust: pick three. Introducing a novel branchless partition implementation.
-
Branchless Lomuto Partitioning
There was a recent post by Voultapher from the sort-research-rs project on Branchless Lomuto Partitioning
https://github.com/Voultapher/sort-research-rs/blob/main/wri...
Discussion here:
https://news.ycombinator.com/item?id=38528452
This post by orlp (creator of Pattern-defeating Quicksort and Glidesort) was linked to in the above post, and I found both to be interesting.
- A novel branchless partition implementation
- Fast, small, robust: Introducing a novel branchless partition implementation
What are some alternatives?
stb - stb single-file public domain libraries for C/C++
tock - A secure embedded operating system for microcontrollers
quadsort - Quadsort is a branchless stable adaptive mergesort faster than quicksort.
ruduino - Reusable components for the Arduino Uno.
mountain-sort - The best algorithm to sort mountains
fluxsort - A fast branchless stable quicksort / mergesort hybrid that is highly adaptive.
buddy_alloc - A single header buddy memory allocator for C & C++
glidesort - A Rust implementation of Glidesort, my stable adaptive quicksort/mergesort hybrid sorting algorithm.
microui - A tiny immediate-mode UI library
Presentations - Collection of personal presentations
x86-simd-sort - C++ template library for high performance SIMD based sorting algorithms