SaaSHub helps you find the best software and product alternatives Learn more →
Top 23 Avx512 Open-Source Projects
-
simdjson
Parsing gigabytes of JSON per second : used by Facebook/Meta Velox, the Node.js runtime, ClickHouse, WatermelonDB, Apache Doris, Milvus, StarRocks
Project mention: 1BRC Merykitty's Magic SWAR: 8 Lines of Code Explained in 3k Words | news.ycombinator.com | 2024-03-09 -
Project mention: Llamafile 0.7 Brings AVX-512 Support: 10x Faster Prompt Eval Times for AMD Zen 4 | news.ycombinator.com | 2024-03-31
The bf16 dot instruction replaces 6 instructions: https://github.com/google/highway/blob/master/hwy/ops/x86_12...
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
-
I was curious about these libraries a few weeks ago and did some searching. Is there one that's got a clearly dominating set of users or contributors?
I don't know what a good way to compare these might be, other than perhaps activity/contributor count.
[1] https://github.com/simd-everywhere/simde
[2] https://github.com/ermig1979/Simd
[3] https://github.com/google/highway
-
xsimd
C++ wrappers for SIMD intrinsics and parallelized, optimized mathematical functions (SSE, AVX, AVX512, NEON, SVE))
https://github.com/xtensor-stack/xsimd
GH topics > HashMap:
-
Simd
C++ image processing and machine learning library with using of SIMD: SSE, AVX, AVX-512, AMX for x86/x64, VMX(Altivec) and VSX(Power7) for PowerPC, NEON for ARM. (by ermig1979)
I was curious about these libraries a few weeks ago and did some searching. Is there one that's got a clearly dominating set of users or contributors?
I don't know what a good way to compare these might be, other than perhaps activity/contributor count.
[1] https://github.com/simd-everywhere/simde
[2] https://github.com/ermig1979/Simd
[3] https://github.com/google/highway
-
StringZilla
Up to 10x faster strings for C, C++, Python, Rust, and Swift, leveraging SWAR and SIMD on Arm Neon and x86 AVX2 & AVX-512-capable chips to accelerate search, sort, edit distances, alignment scores, etc 🦖
Project mention: Measuring energy usage: regular code vs. SIMD code | news.ycombinator.com | 2024-02-19The 3.5x energy-efficiency gap between serial and SIMD code becomes even larger when
A. you do byte-level processing instead of float words;
B. you use embedded, IoT, and other low-energy devices.
A few years ago I've compared Nvidia Jetson Xavier (long before the Orin release), Intel-based MacBook Pro with Core i9, and AVX-512 capable CPUs on substring search benchmarks.
On Xavier one can quite easily disable/enable cores and reconfigure power usage. At peak I got to 4.2 GB/J which was an 8.3x improvement in inefficiency over LibC in substring search operations. The comparison table is still available in the older README: https://github.com/ashvardanian/StringZilla/tree/v2.0.2?tab=...
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
kfr
Fast, modern C++ DSP framework, FFT, Sample Rate Conversion, FIR/IIR/Biquad Filters (SSE, AVX, AVX-512, ARM NEON)
-
-
-
-
sha256-simd
Accelerate SHA256 computations in pure Go using AVX512, SHA Extensions for x86 and ARM64 for ARM. On AVX512 it provides an up to 8x improvement (over 3 GB/s per core). SHA Extensions give a performance boost of close to 4x over native.
BLAKE3 is faster than hardware accelerated SHA-2 because the tree mode used in BLAKE3 allows hashing parts of a single message in parallel (with SHA-2, parts of a single message have to be hashed one after another, and parallelism is only used in workloads where you process multiple messages at the same time).
-
Project mention: SIMD based custom object and key-value pair sorting in C++ | news.ycombinator.com | 2024-02-14
-
SimSIMD
Up to 200x Faster Inner Products and Vector Similarity — for Python, JavaScript, Rust, and C, supporting f64, f32, f16 real & complex, i8, and binary vectors using SIMD for both x86 AVX2 & AVX-512 and Arm NEON & SVE 📐
-
I'm the main author of Highway, so I have some opinions :D Number of operations/platforms supported are important criteria.
A hopefully unbiased commentary:
Simde allows you to take existing nonportable intrinsics and get them to run on another platform. This is useful when you have a bunch of existing code and tight deadlines. The downside is less than optimal performance - a portable abstraction can be more efficient than forcing one platform to exactly match the semantics of another. Although a ton of effort has gone into Simde, sometimes it also resorts to autovectorization which may or may not work.
Eigen and SLEEF are mostly math-focused projects that also have a portability layer. SLEEF is designed for C and thus has type suffixes which are rather verbose, see https://github.com/shibatch/sleef/blob/master/src/libm/sleef... But it offers a complete (more so than Highway's) libm.
-
neither proposing nor taking a position on this possible addition)
> ... For completeness we would also like to add that a serious issue is that C still lacks vector operations.
Those are good points. The authors don't take a stance on it, but I do think that syntax for packed structs should be standardized. IMO, so should syntax for inline assembly (both as optional features). These are already common extensions; this is exactly what they should standardize. The additions of "typeof" and #embed are also good examples of this (they had been talking about adding #embed since 1995 [1]).
As for vector instructions, I'm unsure how it could be implemented in a standard way, but I'm not against it. Maybe something like this [2], but with the syntax changed for C instead of C++.
-
I think all of these techniques check whether the input string is correct. For example see here https://github.com/WojciechMula/toys/blob/master/lookup-in-s...
-
-
-
Project mention: Show HN: The fastest Turbo-Base64 now for Python | news.ycombinator.com | 2023-08-24
** Cython bindings for Turbo Base64 [1] **
- 20-30x faster than the standard library
- Benchmarks faster than any other C base64 library
- Fastest implementation of AVX, AVX2, and AVX512 base64 encoding
- No other dependencies
-
-
Project mention: The least interesting part about AVX-512 is the 512 bits vector width | news.ycombinator.com | 2023-06-19
Very useful. In fact, it speeds up a single instance (i.e. not taking advantage of SIMD) of MD5 by 20%: https://github.com/animetosho/md5-optimisation#x86-avx512-vl...
-
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Avx512 related posts
- OSS: Relicense to Apache 2 Globally
- Measuring energy usage: regular code vs. SIMD code
- Show HN: StringZilla v3 with C++, Rust, and Swift bindings, and AVX-512 and NEON
- How fast is rolling Karp-Rabin hashing?
- SIMD Everywhere Optimization from ARM Neon to RISC-V Vector Extensions
- Stringzilla: 10x Faster SIMD-accelerated String Class
- Stringzilla: 10x faster SIMD-accelerated Python `str` class
-
A note from our sponsor - SaaSHub
www.saashub.com | 17 Apr 2024
Index
What are some of the best open-source Avx512 projects? This list will help you:
Project | Stars | |
---|---|---|
1 | simdjson | 18,337 |
2 | highway | 3,608 |
3 | oneDNN | 3,446 |
4 | simde | 2,157 |
5 | xsimd | 2,024 |
6 | Simd | 1,966 |
7 | StringZilla | 1,749 |
8 | kfr | 1,578 |
9 | Vc | 1,413 |
10 | libsimdpp | 1,186 |
11 | sneller | 967 |
12 | sha256-simd | 930 |
13 | x86-simd-sort | 793 |
14 | SimSIMD | 707 |
15 | sleef | 583 |
16 | std-simd | 544 |
17 | toys | 311 |
18 | nsimd | 310 |
19 | sse-popcount | 309 |
20 | Turbo-Base64 | 251 |
21 | Hybridizer | 229 |
22 | md5-optimisation | 96 |
23 | argminmax | 51 |