memset_benchmark
fancy-memset
memset_benchmark | fancy-memset | |
---|---|---|
11 | 3 | |
296 | 9 | |
- | - | |
1.8 | 0.0 | |
over 2 years ago | about 2 years ago | |
Assembly | Assembly | |
- | - |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
memset_benchmark
-
Function multi-versioning in GCC 6 (2016)
mem* do not need to be called via ifunc. That is a toolchain decision. See e.g. https://github.com/nadavrot/memset_benchmark for recent data about the cost of PLT indirection for small copies.
- Optimising Memset and Memcpy
- Fast Memset and Memcpy implementations
-
A 100LOC C impl of memset, that is faster than glibc's
Probably poorly. It is a violation to cast an unaligned pointer to an aligned type. And the code looks like it does just that right here: https://github.com/nadavrot/memset_benchmark/blob/main/src/l...
This is undefined behavior under C99 ยง6.3.2.3 Paragraph 7.
-
A faster implementation of memset in 100 LOC
I was impressed by the notion until I saw the code...
fancy-memset
-
LLVM's Libc Gets Much Faster memcpy For RISC-V
I only have experience with their amd64 code.
> What problems do they have?
Nothing in particular, just not particularly amazing performance. They work fine. One thing they have going for them is that they typically have separate versions for every interesting architecture feature level/set, whereas e.g. bionic only has sse code. I guess I can point at my own implementations of memset and memcmp (https://github.com/moon-chilled/fancy-memset https://github.com/moon-chilled/fancy-memcmp), both of which employ novel techniques not used by glibc; but I've not yet gotten around to doing proper benchmarks on either.
-
Fast Memset and Memcpy implementations
Going to plug my own implementation of the same ideas. Nearly half the branches and code size (gotta save that btb and icache!), similar speed except for <16 range (I am much slower; but these are empirically very rare); and 128~180 range (I am appreciably faster for some reason, this was weird).
-
The Art of Picking Intel Registers (2003)
Not sure what you mean by 'part'. I reproduced this across two intel and two amd CPUs, comparing my own[0] memset implementation, glibc's, and bionic's.
0. https://github.com/moon-chilled/fancy-memset
What are some alternatives?
safeclib - safec libc extension with all C11 Annex K functions
gcc
fancy-memcmp - small, fast memcmp
libnbd
qemu
llvm-project - The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
nbdkit