optimization-manual
AvxMath
optimization-manual | AvxMath | |
---|---|---|
3 | 2 | |
738 | 2 | |
1.9% | - | |
3.8 | 10.0 | |
2 months ago | almost 2 years ago | |
Assembly | C++ | |
BSD Zero Clause License | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
optimization-manual
-
Fastest Branchless Binary Search
There's two ways I vectorized linear and binary search (in practice you often want a combination, always benchmark on your real-world datasets!) - Do N binary searches simultaneously, each lane is essentially doing one bsearch. Obviously, this only works if you are doing multiple searches. - use the VPCONFLICT instruction for the linear search parts, there's even code from the Intel SDM doing it: https://github.com/intel/optimization-manual/blob/main/chap18/ex20/avx512_vector_dp.asm
-
Zen4's AVX512 Teardown
The Intel optimization manual has a fun example where they use vpconflict for vectorizing sparse dot products: https://github.com/intel/optimization-manual/blob/main/chap1...
I benchmarked it on Intel, and it was indeed quite fast/a good improvement over the scalar version. Will be interesting to try that on AMD.
- Intel Optimization Reference Manual Code Samples
AvxMath
-
Implementing Cosine in C from Scratch
I once did that as well: https://github.com/Const-me/AvxMath/blob/master/AvxMath/AvxM...
The method is different, and the OP hasn’t mentioned it — high-degree minimax polynomial approximation.
-
Zen4's AVX512 Teardown
BTW, since you apparently working on the stuff like that, check out that repository:
https://github.com/Const-me/AvxMath/blob/master/AvxMath/AvxM...
The license is MIT, copy-paste friendly. It doesn’t use AVX512 though, only AVX1 and optionally 2.
What are some alternatives?
sb_lower_bound - Fastest Branchless Binary Search
sse_mathfun - an extended version of Julien Pommier's sse_mathfun
llvm-project - The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
scheme - An R7RS Scheme implemented in WebAssembly
sectorc - A C Compiler that fits in the 512 byte boot sector of an x86 machine
FindMinimaxPolynomial.jl
src - Read-only git conversion of OpenBSD's official CVS src repository. Pull requests not accepted - send diffs to the tech@ mailing list.