Top 17 C++ Avx512 Projects

simdjson

65 18,362 9.2 C++

Parsing gigabytes of JSON per second : used by Facebook/Meta Velox, the Node.js runtime, ClickHouse, WatermelonDB, Apache Doris, Milvus, StarRocks

Project mention: Tips on adding JSON output to your command line utility. (2021) | news.ycombinator.com | 2024-04-20

It's also supported by simdjson [0] (which has a lot of language bindings [1]):
> Multithreaded processing of gigantic Newline-Delimited JSON (ndjson) and related formats at 3.5 GB/s
[0] https://simdjson.org/
[0] https://github.com/simdjson/simdjson?tab=readme-ov-file#bind...

highway

66 3,623 9.8 C++

Performance-portable, length-agnostic SIMD with runtime dispatch

Project mention: Llamafile 0.7 Brings AVX-512 Support: 10x Faster Prompt Eval Times for AMD Zen 4 | news.ycombinator.com | 2024-03-31

The bf16 dot instruction replaces 6 instructions: https://github.com/google/highway/blob/master/hwy/ops/x86_12...

InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
oneDNN

5 3,456 10.0 C++

oneAPI Deep Neural Network Library (oneDNN)

Project mention: Blaze: A High Performance C++ Math library | news.ycombinator.com | 2024-04-17

If you are talking about non-small matrix multiplication in MKL, is now in opensource as a part of oneDNN. It literally has exactly the same code, as in MKL (you can see this by inspecting constants or doing high-precision benchmarks).
For small matmul there is libxsmm. It may take tremendous efforts make something faster than oneDNN and libxsmm, as jit-based approach of https://github.com/oneapi-src/oneDNN/blob/main/src/gpu/jit/g... is too flexible: if someone finds a better sequence, oneDNN can reuse it without major change of design.
But MKL is not limited to matmul, I understand it...

xsimd

3 2,036 8.7 C++

C++ wrappers for SIMD intrinsics and parallelized, optimized mathematical functions (SSE, AVX, AVX512, NEON, SVE))

Project mention: GDlog: A GPU-Accelerated Deductive Engine | news.ycombinator.com | 2023-12-03

https://github.com/xtensor-stack/xsimd
GH topics > HashMap:

Simd

1 1,971 9.6 C++

C++ image processing and machine learning library with using of SIMD: SSE, AVX, AVX-512, AMX for x86/x64, VMX(Altivec) and VSX(Power7) for PowerPC, NEON for ARM. (by ermig1979)

Project mention: The Case of the Missing SIMD Code | news.ycombinator.com | 2023-06-08

I was curious about these libraries a few weeks ago and did some searching. Is there one that's got a clearly dominating set of users or contributors?
I don't know what a good way to compare these might be, other than perhaps activity/contributor count.
[1] https://github.com/simd-everywhere/simde
[2] https://github.com/ermig1979/Simd
[3] https://github.com/google/highway
[4] https://gitlab.com/libeigen/eigen
[5] https://github.com/shibatch/sleef

StringZilla

14 1,776 9.8 C++

Up to 10x faster strings for C, C++, Python, Rust, and Swift, leveraging SWAR and SIMD on Arm Neon and x86 AVX2 & AVX-512-capable chips to accelerate search, sort, edit distances, alignment scores, etc 🦖

Project mention: Measuring energy usage: regular code vs. SIMD code | news.ycombinator.com | 2024-02-19

The 3.5x energy-efficiency gap between serial and SIMD code becomes even larger when
A. you do byte-level processing instead of float words;
B. you use embedded, IoT, and other low-energy devices.
A few years ago I've compared Nvidia Jetson Xavier (long before the Orin release), Intel-based MacBook Pro with Core i9, and AVX-512 capable CPUs on substring search benchmarks.
On Xavier one can quite easily disable/enable cores and reconfigure power usage. At peak I got to 4.2 GB/J which was an 8.3x improvement in inefficiency over LibC in substring search operations. The comparison table is still available in the older README: https://github.com/ashvardanian/StringZilla/tree/v2.0.2?tab=...

kfr

2 1,582 9.2 C++

Fast, modern C++ DSP framework, FFT, Sample Rate Conversion, FIR/IIR/Biquad Filters (SSE, AVX, AVX-512, ARM NEON)
WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
Vc

6 1,417 6.1 C++

SIMD Vector Classes for C++
libsimdpp

1 1,188 0.0 C++

Portable header-only C++ low level SIMD library
x86-simd-sort

7 794 9.5 C++

C++ template library for high performance SIMD based sorting algorithms

Project mention: SIMD based custom object and key-value pair sorting in C++ | news.ycombinator.com | 2024-02-14

std-simd

9 544 1.1 C++

std::experimental::simd for GCC [ISO/IEC TS 19570:2018]

Project mention: A proposal for the next version of C [pdf] | news.ycombinator.com | 2024-01-20

neither proposing nor taking a position on this possible addition)
> ... For completeness we would also like to add that a serious issue is that C still lacks vector operations.
Those are good points. The authors don't take a stance on it, but I do think that syntax for packed structs should be standardized. IMO, so should syntax for inline assembly (both as optional features). These are already common extensions; this is exactly what they should standardize. The additions of "typeof" and #embed are also good examples of this (they had been talking about adding #embed since 1995 [1]).
As for vector instructions, I'm unsure how it could be implemented in a standard way, but I'm not against it. Maybe something like this [2], but with the syntax changed for C instead of C++.
[1]: https://groups.google.com/g/comp.std.c/c/zWFEXDvyTwM
[2]: https://github.com/VcDevel/std-simd

toys

2 311 5.6 C++

Storage for my snippets, toy programs, etc.

Project mention: Modern Perfect Hashing for Strings | news.ycombinator.com | 2023-04-30

I think all of these techniques check whether the input string is correct. For example see here https://github.com/WojciechMula/toys/blob/master/lookup-in-s...

sse-popcount

2 309 5.6 C++

SIMD (SSE) population count --- http://0x80.pl/articles/sse-popcount.html
md5-optimisation

2 96 2.8 C++

The fastest MD5 implementation using x86 assembly

Project mention: The least interesting part about AVX-512 is the 512 bits vector width | news.ycombinator.com | 2023-06-19

Very useful. In fact, it speeds up a single instance (i.e. not taking advantage of SIMD) of MD5 by 20%: https://github.com/animetosho/md5-optimisation#x86-avx512-vl...

std_find_simd

2 18 0.0 C++

std::find simd version
VectorizedKernel

1 7 2.6 C++

Running GPGPU-like kernels on CPU with auto-vectorization for SSE/AVX/AVX512 SIMD Architectures
ThinkingInSimd

1 3 2.3 C++

An essay comparing performance implications of ignoring AVX acceleration

Project mention: Fastest Branchless Binary Search | news.ycombinator.com | 2023-08-11

> In this case std::lower_bound is very slightly but consistently faster than sb_lower_bound. To always get the best performance it is possible for libraries to use sb_lower_bound whenever directly working on primitive types and std::lower_bound otherwise.
I will say that if this is the case, there are probably much better versions of binary search out there for primitive types. I made one just screwing around with SIMD that's 3x faster than std::lower_bound until becoming memory bound:
https://github.com/matthewkolbe/ThinkingInSimd/tree/main/alg...

SaaSHub

www.saashub.com sponsored

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

C++ Avx512 related posts

Measuring energy usage: regular code vs. SIMD code
1 project | news.ycombinator.com | 19 Feb 2024
SIMD Everywhere Optimization from ARM Neon to RISC-V Vector Extensions
6 projects | news.ycombinator.com | 29 Sep 2023
Stringzilla: 10x Faster SIMD-accelerated String Class
1 project | /r/programming | 30 Aug 2023
Stringzilla: 10x faster SIMD-accelerated Python `str` class
2 projects | /r/Python | 30 Aug 2023
The Case of the Missing SIMD Code
7 projects | news.ycombinator.com | 8 Jun 2023
Modern Perfect Hashing for Strings
1 project | news.ycombinator.com | 30 Apr 2023
SIMD intrinsics and the possibility of a standard library solution
16 projects | /r/cpp | 8 Jan 2023
A note from our sponsor - SaaSHub
www.saashub.com | 24 Apr 2024

SaaSHub helps you find the best software and product alternatives Learn more →

Index

What are some of the best open-source Avx512 projects in C++? This list will help you:

	Project	Stars
1	simdjson	18,362
2	highway	3,623
3	oneDNN	3,456
4	xsimd	2,036
5	Simd	1,971
6	StringZilla	1,776
7	kfr	1,582
8	Vc	1,417
9	libsimdpp	1,188
10	x86-simd-sort	794
11	std-simd	544
12	toys	311
13	sse-popcount	309
14	md5-optimisation	96
15	std_find_simd	18
16	VectorizedKernel	7
17	ThinkingInSimd	3