rotate
buddy_alloc
rotate | buddy_alloc | |
---|---|---|
4 | 7 | |
143 | 121 | |
- | - | |
10.0 | 7.3 | |
over 1 year ago | about 1 month ago | |
C | C | |
GNU General Public License v3.0 or later | BSD Zero Clause License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
rotate
-
10~17x faster than what? A performance analysis of Intel x86-SIMD-sort (AVX-512)
quadsort/fluxsort/crumsort author here.
For me there's a strong visual component, perhaps most obvious for my work on array rotation algorithms.
https://github.com/scandum/rotate
There's also the ability to notice strange/curious/discordant things, and either connect the dots through trying semi-random things, as well as sudden insights which seem to be partially subconscious.
One of my (many) theories is that I have the ability to use long-term memory in a quasi-similar manner to short-term memory for problem solving. My IQ is in the 120-130 range, I suffer from hypervigilance, so it's generally on the lower end due to lack of sleep.
I'd say there's a strong creative aspect. If I could redo life I might try my hand at music.
-
Is there a more efficient way to write this C program?
This is essentially just a rotation of a subrange of your original array. A variety of different algorithms for this operation can be found here.
-
Building the Perfect Memory Bandwidth Beast
Memory bandwidth is 1000x lower than CPU bandwidth, so as a rule of thumb any algorithm whose work scales linearly in the amount of data being processed will be memory bandwidth bound, and also any algorithm which can't be structured to do a lot of work on one memory region at once before moving onto the next one.
Examples (for large enough inputs that it's relevant) include shuffling, sorting, kmeans clustering, branch and bound sudoku solving, vector addition, dot products, and so on.
Moreover, writing a particular piece of code is often easier if you ignore memory bandwidth as a constraint. The classic example is matrix multiplication -- it can be structured such that even disk bandwidth isn't relevant compared to CPU bandwidth, but doing so is a little fiddly compared to the naive n^2 dot products approach, so writing it yourself usually results in a memory bandwidth bound solution for large matrices.
Similarly, writing two passes over your data rather than doing a mega-loop, the choice to use classic kmeans rather than one of its approximations (when it would be appropriate to do so), or not enforcing sortedness at some reasonable boundary and having to do additional passes over your data. It's easy to write code that hoovers up way more bandwidth than it needs to, and often faster algorithms that come out don't do anything different than access the right data at the right time to reduce that pressure, like a trinity rotation [0].
Caveat: Benchmark everything, especially as you're building intuition. Trying to fix what you think is a memory bandwidth issue can result in pipeline stalls and all sorts of fun things, especially when your server has more faster caches than your dev machine, when data in prod doesn't match your micro benchmark, ....
[0] https://github.com/scandum/rotate
- A collection of array rotation algorithms
buddy_alloc
-
buddy memory allocator - project update (2 years)
If you need a sub-allocator with predictable performance feel free to give it a try. The code is here and it is licensed under the 0BSD license, making it as lax and as close to public domain as possible. Comments, issues and PRs are always welcomed and appreciated. Thanks!
-
Open-source MISRA-compliant projects
I maintain a project that's not technically MISRA compliant (due to being a memory allocator and MISRA disallowing the very idea) and I keep it at 100% test coverage with support for multiple compilers and operating systems. Over time I had a few users reporting back - since it's working for them I count that as success. Is it wildly popular ? Of course not, but it doesn't have to be.
-
One year ago I wrote a buddy memory allocator - project update
You are right about the tests - they are written with 64-bit in mind. I ought to rework them to switch sizes based on arch but that will take a weekend. I've filed https://github.com/spaskalev/buddy_alloc/issues/19 to track this.
-
is there some good tutorial about how malloc or mcheck works?
I also maintain a application-based malloc (that doesn't do obtaining and releasing memory through the OS, just managing sub-diving a larger memory block into smaller requests) at https://github.com/spaskalev/buddy_alloc - feel free to ping me with any questions about it.
-
I'm giving out microgrants to open source projects for the third year in a row! Brag about your projects here so I can see them, big or small!
I'm the author of https://github.com/spaskalev/buddy_alloc - a custom memory allocator for C (modern C11, works with C++ as well) designed for predictable and repeatable performance. It is suitable for use in embedded, games and any other system with soft or hard real-time demands. It has 100% line and branch test coverage and uses a fixed amount of space on the call stack when called. Recently the project had its first external contribution as well. Cheers!
-
What is your own favorite C project?
I made a memory allocator that turned out rather neat - https://github.com/spaskalev/buddy_alloc
What are some alternatives?
stb - stb single-file public domain libraries for C/C++
rpmalloc - Public domain cross platform lock free thread caching 16-byte aligned memory allocator implemented in C
quadsort - Quadsort is a branchless stable adaptive mergesort faster than quicksort.
VulkanMemoryAllocator - Easy to integrate Vulkan memory allocation library
sort-research-rs - Test and benchmark suite for sort implementations.
isoalloc - A general purpose memory allocator that implements an isolation security strategy to mitigate memory safety issues while maintaining good performance
mountain-sort - The best algorithm to sort mountains
gunslinger - C99, header-only framework for games and multimedia applications
microui - A tiny immediate-mode UI library
rotate - [WIP] static typed programming language that compiles to vm bytecode
Presentations - Collection of personal presentations