libCat
highway
Our great sponsors
libCat | highway | |
---|---|---|
21 | 51 | |
51 | 2,543 | |
- | 4.1% | |
8.8 | 9.0 | |
about 1 month ago | 8 days ago | |
C++ | C++ | |
GNU Affero General Public License v3.0 | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
libCat
-
I hate almost all software
That's awesome! I'm working on something that sounds similar. https://github.com/cons-cat/libcat
I'd love to see your work if you're willing to share it here!
- Manticore 6.0.0 – a faster alternative to Elasticsearch in C++
-
Chromium accepting Rust in a clear move to copy what Mozilla have done, replace C++ source code
It's worse in the standard library than it has to be. When I refactored my traits to minimize template instantiations and lean on concepts as much as possible, I measured over 30% improvement to clean build compile times. It's not possible for the standard to do this, because it would subtly change the API. For instance, you can't instantiate or take the address of a concept, but you can for a type-trait class. No reason you'd want to do that, but you can, so they can't "break" the standard library by optimizing this.
-
C++'s smaller cleaner language
This doesn't have to be true. Over the past year I've made progress towards demonstrating how even non-freestanding C++ can be written without any C or C++ standard library headers or DLLs (with large benefits). There are a few names which the compilers require to be in the std:: namespace, though, but they're very special features like source_location and construct_at with semantics that can't be expressed otherwise.
-
Is bloat in std::unexpected expected?
It isn't that hard to put a predicate into a type. We have lambdas in an unevaluated context, CTAD, and templated type aliases. https://github.com/Cons-Cat/libCat/blob/main/src/libraries/scaredy/cat/scaredy https://github.com/Cons-Cat/libCat/blob/main/src/global_includes.hpp#L70 https://github.com/Cons-Cat/libCat/blob/main/src/libraries/linux/cat/linux#L289 You do it like this.
- Software disenchantment - why does modern programming seem to lack of care for efficiency, simplicity, and excellence
- tiny::optional – a C++ optional that does not waste memory
-
Rust analyzer/clippy alternative for C++?
I use clang-tidy. These are my current linting rules.
-
John "God" Carmack: C++ with a C flavor is still the best (also: Python performance "keeps hitting me in the face")
I'm working on this! https://github.com/Cons-Cat/LibCat
- “Hello world” is slower in C++ than in C (Linux)
highway
-
AVX512 intrinsics for JDK’s Arrays.sort methods
Disclosure: I am the main author; happy to discuss.
0: https://github.com/google/highway/blob/master/hwy/contrib/so...
-
A Rust port of crumsort, up to 75% faster than pdqsort
If all you care about is integers I recommend taking a look at vqsort, it can even be parallelized with ips4o as seen in the whitepaper.
-
SIMD with Zig
Implementing Arm semantics or x86 on the other requires ~5 instructions, but if we generalize the definition to allow reordering (e.g. Highway's ReorderWidenMulAccumulate [1]), it's only 2 instructions.
1: https://github.com/google/highway/blob/master/g3doc/quick_re...
-
Similarity Measures on Arm SVE and NEON, x86 AVX2 and AVX-512
Also, are you familiar with https://github.com/google/highway? That gives you portable intrinsics so you can write your code only once (but still specialize per arch if it's helpful). Disclosure: I am the main author of this library.
-
Intel Publishes Blazing Fast AVX-512 Sorting Library, Numpy Switching To It For 10~17x Faster Sorts
Nice. Would you like to add our vqsort to your benchmark? (Note: we haven't yet implemented a workaround specifically for AMD's compressstoreu, but do not use it for 64 nor 128-bit keys.)
-
Blazing Fast AVX-512 Sorting Library by Intel, 10~17x Faster Sorts in NumPy
Benchmark results using vqsort's bench_sort on Skylake workstation (patch: https://github.com/google/highway/pull/1140)
vqsort is about 1.9x and 1.8x as fast on 1M keys and 100M uniform random keys, respectively.
Note that this code did not pass the benchmark's verification.
AVX3: vq: i32: uniform32: 1.00E+06 1051 MB/s ( 1 threads)
It would be interesting to see it benchmarked against the highway qsort[1] Google published last year.
[1] https://github.com/google/highway/tree/master/hwy/contrib/so...
sagarm has posted one result in another thread. I'll also look into adding their code to our benchmark :)
It's great to see more vector code, but caveat for anyone using this: the pivot sampling is quite basic, just median of 16 evenly spaced samples. This is will perform poorly on skewed distributions including all-equal and very few unique values. Yes, in the worst case it can resort to std::sort but that's a >10x speed hit and until recently also potentially O(N^2)!.
We have drawn larger samples (nine vectors, not one), and subsequently extended the vqsort algorithm beyond what is described in our paper, e.g. special handling for 1..3 unique keys, see https://github.com/google/highway/blob/master/hwy/contrib/so....
-
Library that could generate vectorized code for different instruction sets?
I didn't try it but google's highway supposed to do this with `HWY_DYNAMIC_DISPATCH`.
-
Advice on porting glibc trig functions to SIMD
Google Highway "Performance-portable, length-agnostic SIMD with runtime dispatch" from last october
What are some alternatives?
Vc - SIMD Vector Classes for C++
swup - :tada: Complete, flexible, extensible, and easy-to-use page transition library for your server-side rendered website.
DirectXMath - DirectXMath is an all inline SIMD C++ linear algebra library for use in games and graphics apps
xsimd - C++ wrappers for SIMD intrinsics and parallelized, optimized mathematical functions (SSE, AVX, AVX512, NEON, SVE))
riscv-v-spec - Working draft of the proposed RISC-V V vector extension
jpeg-xl
ispc - Intel® Implicit SPMD Program Compiler
Vrmac - Vrmac Graphics, a cross-platform graphics library for .NET. Supports 3D, 2D, and accelerated video playback. Works on Windows 10 and Raspberry Pi4.
simd_utils - A header only library implementing common mathematical functions using SIMD intrinsics
CppSPMD_Fast - Optimized CppSPMD test project: macro control flow, SSE4.1/AVX1/AVX2/AVX2 FMA support
DtsDecoder
nsimd - Agenium Scale vectorization library for CPUs and GPUs