Top 23 Avx512 Open-Source Projects

simdjson

63 18,337 9.2 C++

Parsing gigabytes of JSON per second : used by Facebook/Meta Velox, the Node.js runtime, ClickHouse, WatermelonDB, Apache Doris, Milvus, StarRocks

Project mention: 1BRC Merykitty's Magic SWAR: 8 Lines of Code Explained in 3k Words | news.ycombinator.com | 2024-03-09
highway

66 3,608 9.8 C++

Performance-portable, length-agnostic SIMD with runtime dispatch

Project mention: Llamafile 0.7 Brings AVX-512 Support: 10x Faster Prompt Eval Times for AMD Zen 4 | news.ycombinator.com | 2024-03-31

The bf16 dot instruction replaces 6 instructions: https://github.com/google/highway/blob/master/hwy/ops/x86_12...
InfluxDB

www.influxdata.com
sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
oneDNN

4 3,446 10.0 C++

oneAPI Deep Neural Network Library (oneDNN)
simde

7 2,157 9.2 C

Implementations of SIMD instruction sets for systems which don't natively support them.

Project mention: The Case of the Missing SIMD Code | news.ycombinator.com | 2023-06-08

I was curious about these libraries a few weeks ago and did some searching. Is there one that's got a clearly dominating set of users or contributors?
I don't know what a good way to compare these might be, other than perhaps activity/contributor count.
[1] https://github.com/simd-everywhere/simde
[2] https://github.com/ermig1979/Simd
[3] https://github.com/google/highway
[4] https://gitlab.com/libeigen/eigen
[5] https://github.com/shibatch/sleef
xsimd

3 2,024 8.7 C++

C++ wrappers for SIMD intrinsics and parallelized, optimized mathematical functions (SSE, AVX, AVX512, NEON, SVE))

Project mention: GDlog: A GPU-Accelerated Deductive Engine | news.ycombinator.com | 2023-12-03

https://github.com/xtensor-stack/xsimd
GH topics > HashMap:
Simd

1 1,966 9.6 C++

C++ image processing and machine learning library with using of SIMD: SSE, AVX, AVX-512, AMX for x86/x64, VMX(Altivec) and VSX(Power7) for PowerPC, NEON for ARM. (by ermig1979)

Project mention: The Case of the Missing SIMD Code | news.ycombinator.com | 2023-06-08

I was curious about these libraries a few weeks ago and did some searching. Is there one that's got a clearly dominating set of users or contributors?
I don't know what a good way to compare these might be, other than perhaps activity/contributor count.
[1] https://github.com/simd-everywhere/simde
[2] https://github.com/ermig1979/Simd
[3] https://github.com/google/highway
[4] https://gitlab.com/libeigen/eigen
[5] https://github.com/shibatch/sleef
StringZilla

14 1,749 9.8 C++

Up to 10x faster strings for C, C++, Python, Rust, and Swift, leveraging SWAR and SIMD on Arm Neon and x86 AVX2 & AVX-512-capable chips to accelerate search, sort, edit distances, alignment scores, etc 🦖

Project mention: Measuring energy usage: regular code vs. SIMD code | news.ycombinator.com | 2024-02-19

The 3.5x energy-efficiency gap between serial and SIMD code becomes even larger when
A. you do byte-level processing instead of float words;
B. you use embedded, IoT, and other low-energy devices.
A few years ago I've compared Nvidia Jetson Xavier (long before the Orin release), Intel-based MacBook Pro with Core i9, and AVX-512 capable CPUs on substring search benchmarks.
On Xavier one can quite easily disable/enable cores and reconfigure power usage. At peak I got to 4.2 GB/J which was an 8.3x improvement in inefficiency over LibC in substring search operations. The comparison table is still available in the older README: https://github.com/ashvardanian/StringZilla/tree/v2.0.2?tab=...
WorkOS

workos.com
sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
kfr

2 1,578 9.2 C++

Fast, modern C++ DSP framework, FFT, Sample Rate Conversion, FIR/IIR/Biquad Filters (SSE, AVX, AVX-512, ARM NEON)
Vc

6 1,413 6.1 C++

SIMD Vector Classes for C++
libsimdpp

1 1,186 0.0 C++

Portable header-only C++ low level SIMD library
sneller

15 967 9.1 Go

World's fastest log analysis: λ + SQL + JSON + S3

Project mention: OSS: Relicense to Apache 2 Globally | news.ycombinator.com | 2024-03-23
sha256-simd

3 930 1.0 Go

Accelerate SHA256 computations in pure Go using AVX512, SHA Extensions for x86 and ARM64 for ARM. On AVX512 it provides an up to 8x improvement (over 3 GB/s per core). SHA Extensions give a performance boost of close to 4x over native.

Project mention: The Curious Case of MD5 | news.ycombinator.com | 2024-01-03

BLAKE3 is faster than hardware accelerated SHA-2 because the tree mode used in BLAKE3 allows hashing parts of a single message in parallel (with SHA-2, parts of a single message have to be hashed one after another, and parallelism is only used in workloads where you process multiple messages at the same time).
https://github.com/minio/sha256-simd
https://github.com/BLAKE3-team/BLAKE3
x86-simd-sort

7 793 9.5 C++

C++ template library for high performance SIMD based sorting algorithms

Project mention: SIMD based custom object and key-value pair sorting in C++ | news.ycombinator.com | 2024-02-14
SimSIMD

15 707 9.6 C

Up to 200x Faster Inner Products and Vector Similarity — for Python, JavaScript, Rust, and C, supporting f64, f32, f16 real & complex, i8, and binary vectors using SIMD for both x86 AVX2 & AVX-512 and Arm NEON & SVE 📐

Project mention: Deep Learning in JavaScript | news.ycombinator.com | 2024-03-28
sleef

17 583 8.1 C

SIMD Library for Evaluating Elementary Functions, vectorized libm and DFT

Project mention: The Case of the Missing SIMD Code | news.ycombinator.com | 2023-06-08

I'm the main author of Highway, so I have some opinions :D Number of operations/platforms supported are important criteria.
A hopefully unbiased commentary:
Simde allows you to take existing nonportable intrinsics and get them to run on another platform. This is useful when you have a bunch of existing code and tight deadlines. The downside is less than optimal performance - a portable abstraction can be more efficient than forcing one platform to exactly match the semantics of another. Although a ton of effort has gone into Simde, sometimes it also resorts to autovectorization which may or may not work.
Eigen and SLEEF are mostly math-focused projects that also have a portability layer. SLEEF is designed for C and thus has type suffixes which are rather verbose, see https://github.com/shibatch/sleef/blob/master/src/libm/sleef... But it offers a complete (more so than Highway's) libm.
std-simd

9 544 1.1 C++

std::experimental::simd for GCC [ISO/IEC TS 19570:2018]

Project mention: A proposal for the next version of C [pdf] | news.ycombinator.com | 2024-01-20

neither proposing nor taking a position on this possible addition)
> ... For completeness we would also like to add that a serious issue is that C still lacks vector operations.
Those are good points. The authors don't take a stance on it, but I do think that syntax for packed structs should be standardized. IMO, so should syntax for inline assembly (both as optional features). These are already common extensions; this is exactly what they should standardize. The additions of "typeof" and #embed are also good examples of this (they had been talking about adding #embed since 1995 [1]).
As for vector instructions, I'm unsure how it could be implemented in a standard way, but I'm not against it. Maybe something like this [2], but with the syntax changed for C instead of C++.
[1]: https://groups.google.com/g/comp.std.c/c/zWFEXDvyTwM
[2]: https://github.com/VcDevel/std-simd
toys

2 311 5.6 C++

Storage for my snippets, toy programs, etc.

Project mention: Modern Perfect Hashing for Strings | news.ycombinator.com | 2023-04-30

I think all of these techniques check whether the input string is correct. For example see here https://github.com/WojciechMula/toys/blob/master/lookup-in-s...
nsimd

2 310 0.0 C

Agenium Scale vectorization library for CPUs and GPUs
sse-popcount

2 309 5.6 C++

SIMD (SSE) population count --- http://0x80.pl/articles/sse-popcount.html
Turbo-Base64

4 251 8.6 C

Turbo Base64 - Fastest Base64 SIMD:SSE/AVX2/AVX512/Neon/Altivec - Faster than memcpy!

Project mention: Show HN: The fastest Turbo-Base64 now for Python | news.ycombinator.com | 2023-08-24

** Cython bindings for Turbo Base64 [1] **
- 20-30x faster than the standard library
- Benchmarks faster than any other C base64 library
- Fastest implementation of AVX, AVX2, and AVX512 base64 encoding
- No other dependencies
[1] - https://github.com/powturbo/Turbo-Base64
Hybridizer

1 229 3.8 C#

Examples of C# code compiled to GPU by hybridizer
md5-optimisation

2 96 2.8 C++

The fastest MD5 implementation using x86 assembly

Project mention: The least interesting part about AVX-512 is the 512 bits vector width | news.ycombinator.com | 2023-06-19

Very useful. In fact, it speeds up a single instance (i.e. not taking advantage of SIMD) of MD5 by 20%: https://github.com/animetosho/md5-optimisation#x86-avx512-vl...
argminmax

3 51 5.7 Rust

Efficient argmin & argmax
SaaSHub

www.saashub.com
sponsored

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2024-03-31.

Avx512 related posts

OSS: Relicense to Apache 2 Globally
1 project | news.ycombinator.com | 23 Mar 2024
Measuring energy usage: regular code vs. SIMD code
1 project | news.ycombinator.com | 19 Feb 2024
Show HN: StringZilla v3 with C++, Rust, and Swift bindings, and AVX-512 and NEON
1 project | news.ycombinator.com | 7 Feb 2024
How fast is rolling Karp-Rabin hashing?
1 project | news.ycombinator.com | 4 Feb 2024
SIMD Everywhere Optimization from ARM Neon to RISC-V Vector Extensions
6 projects | news.ycombinator.com | 29 Sep 2023
Stringzilla: 10x Faster SIMD-accelerated String Class
1 project | /r/programming | 30 Aug 2023
Stringzilla: 10x faster SIMD-accelerated Python `str` class
2 projects | /r/Python | 30 Aug 2023
A note from our sponsor - SaaSHub
www.saashub.com | 17 Apr 2024

SaaSHub helps you find the best software and product alternatives Learn more →

Index

What are some of the best open-source Avx512 projects? This list will help you:

	Project	Stars
1	simdjson	18,337
2	highway	3,608
3	oneDNN	3,446
4	simde	2,157
5	xsimd	2,024
6	Simd	1,966
7	StringZilla	1,749
8	kfr	1,578
9	Vc	1,413
10	libsimdpp	1,186
11	sneller	967
12	sha256-simd	930
13	x86-simd-sort	793
14	SimSIMD	707
15	sleef	583
16	std-simd	544
17	toys	311
18	nsimd	310
19	sse-popcount	309
20	Turbo-Base64	251
21	Hybridizer	229
22	md5-optimisation	96
23	argminmax	51