SaaSHub helps you find the best software and product alternatives Learn more →
Top 17 C++ Avx512 Projects
-
simdjson
Parsing gigabytes of JSON per second : used by Facebook/Meta Velox, the Node.js runtime, ClickHouse, WatermelonDB, Apache Doris, Milvus, StarRocks
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
xsimd
C++ wrappers for SIMD intrinsics and parallelized, optimized mathematical functions (SSE, AVX, AVX512, NEON, SVE))
-
Simd
C++ image processing and machine learning library with using of SIMD: SSE, AVX, AVX-512, AMX for x86/x64, VMX(Altivec) and VSX(Power7) for PowerPC, NEON for ARM. (by ermig1979)
-
StringZilla
Up to 10x faster strings for C, C++, Python, Rust, and Swift, leveraging SWAR and SIMD on Arm Neon and x86 AVX2 & AVX-512-capable chips to accelerate search, sort, edit distances, alignment scores, etc 🦖
-
kfr
Fast, modern C++ DSP framework, FFT, Sample Rate Conversion, FIR/IIR/Biquad Filters (SSE, AVX, AVX-512, ARM NEON)
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
VectorizedKernel
Running GPGPU-like kernels on CPU with auto-vectorization for SSE/AVX/AVX512 SIMD Architectures
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Project mention: Tips on adding JSON output to your command line utility. (2021) | news.ycombinator.com | 2024-04-20It's also supported by simdjson [0] (which has a lot of language bindings [1]):
> Multithreaded processing of gigantic Newline-Delimited JSON (ndjson) and related formats at 3.5 GB/s
[0] https://simdjson.org/
[0] https://github.com/simdjson/simdjson?tab=readme-ov-file#bind...
Project mention: Llamafile 0.7 Brings AVX-512 Support: 10x Faster Prompt Eval Times for AMD Zen 4 | news.ycombinator.com | 2024-03-31The bf16 dot instruction replaces 6 instructions: https://github.com/google/highway/blob/master/hwy/ops/x86_12...
If you are talking about non-small matrix multiplication in MKL, is now in opensource as a part of oneDNN. It literally has exactly the same code, as in MKL (you can see this by inspecting constants or doing high-precision benchmarks).
For small matmul there is libxsmm. It may take tremendous efforts make something faster than oneDNN and libxsmm, as jit-based approach of https://github.com/oneapi-src/oneDNN/blob/main/src/gpu/jit/g... is too flexible: if someone finds a better sequence, oneDNN can reuse it without major change of design.
But MKL is not limited to matmul, I understand it...
https://github.com/xtensor-stack/xsimd
GH topics > HashMap:
I was curious about these libraries a few weeks ago and did some searching. Is there one that's got a clearly dominating set of users or contributors?
I don't know what a good way to compare these might be, other than perhaps activity/contributor count.
[1] https://github.com/simd-everywhere/simde
[2] https://github.com/ermig1979/Simd
[3] https://github.com/google/highway
[4] https://gitlab.com/libeigen/eigen
[5] https://github.com/shibatch/sleef
Project mention: Measuring energy usage: regular code vs. SIMD code | news.ycombinator.com | 2024-02-19The 3.5x energy-efficiency gap between serial and SIMD code becomes even larger when
A. you do byte-level processing instead of float words;
B. you use embedded, IoT, and other low-energy devices.
A few years ago I've compared Nvidia Jetson Xavier (long before the Orin release), Intel-based MacBook Pro with Core i9, and AVX-512 capable CPUs on substring search benchmarks.
On Xavier one can quite easily disable/enable cores and reconfigure power usage. At peak I got to 4.2 GB/J which was an 8.3x improvement in inefficiency over LibC in substring search operations. The comparison table is still available in the older README: https://github.com/ashvardanian/StringZilla/tree/v2.0.2?tab=...
Project mention: SIMD based custom object and key-value pair sorting in C++ | news.ycombinator.com | 2024-02-14
neither proposing nor taking a position on this possible addition)
> ... For completeness we would also like to add that a serious issue is that C still lacks vector operations.
Those are good points. The authors don't take a stance on it, but I do think that syntax for packed structs should be standardized. IMO, so should syntax for inline assembly (both as optional features). These are already common extensions; this is exactly what they should standardize. The additions of "typeof" and #embed are also good examples of this (they had been talking about adding #embed since 1995 [1]).
As for vector instructions, I'm unsure how it could be implemented in a standard way, but I'm not against it. Maybe something like this [2], but with the syntax changed for C instead of C++.
[1]: https://groups.google.com/g/comp.std.c/c/zWFEXDvyTwM
[2]: https://github.com/VcDevel/std-simd
I think all of these techniques check whether the input string is correct. For example see here https://github.com/WojciechMula/toys/blob/master/lookup-in-s...
Project mention: The least interesting part about AVX-512 is the 512 bits vector width | news.ycombinator.com | 2023-06-19Very useful. In fact, it speeds up a single instance (i.e. not taking advantage of SIMD) of MD5 by 20%: https://github.com/animetosho/md5-optimisation#x86-avx512-vl...
> In this case std::lower_bound is very slightly but consistently faster than sb_lower_bound. To always get the best performance it is possible for libraries to use sb_lower_bound whenever directly working on primitive types and std::lower_bound otherwise.
I will say that if this is the case, there are probably much better versions of binary search out there for primitive types. I made one just screwing around with SIMD that's 3x faster than std::lower_bound until becoming memory bound:
https://github.com/matthewkolbe/ThinkingInSimd/tree/main/alg...
C++ Avx512 related posts
- Measuring energy usage: regular code vs. SIMD code
- SIMD Everywhere Optimization from ARM Neon to RISC-V Vector Extensions
- Stringzilla: 10x Faster SIMD-accelerated String Class
- Stringzilla: 10x faster SIMD-accelerated Python `str` class
- The Case of the Missing SIMD Code
- Modern Perfect Hashing for Strings
- SIMD intrinsics and the possibility of a standard library solution
-
A note from our sponsor - SaaSHub
www.saashub.com | 24 Apr 2024
Index
What are some of the best open-source Avx512 projects in C++? This list will help you:
Project | Stars | |
---|---|---|
1 | simdjson | 18,362 |
2 | highway | 3,623 |
3 | oneDNN | 3,456 |
4 | xsimd | 2,036 |
5 | Simd | 1,971 |
6 | StringZilla | 1,776 |
7 | kfr | 1,582 |
8 | Vc | 1,417 |
9 | libsimdpp | 1,188 |
10 | x86-simd-sort | 794 |
11 | std-simd | 544 |
12 | toys | 311 |
13 | sse-popcount | 309 |
14 | md5-optimisation | 96 |
15 | std_find_simd | 18 |
16 | VectorizedKernel | 7 |
17 | ThinkingInSimd | 3 |
Sponsored