SaaSHub helps you find the best software and product alternatives Learn more →
Top 23 C++ Avx512 Projects
-
simdjson
Parsing gigabytes of JSON per second : used by Facebook/Meta Velox, the Node.js runtime, ClickHouse, WatermelonDB, Apache Doris, Milvus, StarRocks
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
People reported challenges building V8 (whether upstream or the Node.js variant) on s390x with z13 support. I don't know if it was discussed on the porters mailing list because it's not public: https://groups.google.com/g/v8-s390-ports
Elsewhere, some people interpreted https://github.com/google/highway/issues/1895 as meaning that Highway code does not work on z13 at all.
-
-
xsimd
C++ wrappers for SIMD intrinsics and parallelized, optimized mathematical functions (SSE, AVX, AVX512, NEON, SVE, WebAssembly, VSX, RISC-V))
-
Simd
C++ image processing and machine learning library with using of SIMD: SSE, AVX, AVX-512, AMX for x86/x64, NEON, SVE for ARM, HVX for Hexagon (by ermig1979)
-
less_slow.cpp
Playing around "Less Slow" coding practices in C++ 20, C, CUDA, PTX, & Assembly, from numerics & SIMD to coroutines, ranges, exception handling, networking and user-space IO
Project mention: Processing Strings 109x Faster Than Nvidia on H100 | news.ycombinator.com | 2025-09-19Yes, at the scale of 128-bit registers NEON is mostly enough, except for a few categories of instructions missing in that ISA subset, like scatter/gather ops, that can yield 30% boost over serial memory accesses: https://github.com/ashvardanian/less_slow.cpp/releases/tag/v...
-
kfr
Fast, modern C++ DSP framework, FFT, Sample Rate Conversion, FIR/IIR/Biquad Filters (SSE, AVX, AVX-512, ARM NEON, RISC-V RVV)
Project mention: Show HN: KFR 7 – major update for C++ DSP library | news.ycombinator.com | 2025-11-17 -
-
Project mention: Eve: Expressive Vector Engine – SIMD in C++ Goes Brrrr | news.ycombinator.com | 2026-04-09
-
-
-
-
-
-
-
-
-
ParallelReductionsBenchmark
Thrust, CUB, TBB, AVX2, AVX-512, CUDA, OpenCL, OpenMP, Metal, and Rust - all it takes to sum a lot of numbers fast!
I was asked this a few months back but don’t have the measurements fresh anymore. In general, I think TBB is one of the more thorough and feature-rich parallelism libraries out there. That said, I just found a comparable usage example in my benchmarks, and it doesn’t look like TBB will have the same low-latency profile as Fork Union: https://github.com/ashvardanian/ParallelReductionsBenchmark/...
-
Jsonifier
A few classes for extremely fast json parsing/serializing in modern C++. Possibly the fastest json parser in C++. Possibly the fastest json serializer in C++. (by RealTimeChris)
-
-
-
VectorizedKernel
Running GPGPU-like kernels on CPU with auto-vectorization for SSE/AVX/AVX512 SIMD Architectures
-
C++ Avx512 discussion
C++ Avx512 related posts
-
SIMD Population Count
-
Copilot implemented a ThreadPool to serve as a replacement for OpenMP
-
Show HN: Less Slow C++
-
Expressive Vector Engine – SIMD in C++
-
Intel Releases x86-SIMD-sort 6.0 for 10x faster AVX2/AVX-512 Sorting
-
SIMD-accelerated computer vision on a $2 microcontroller
-
Measuring energy usage: regular code vs. SIMD code
-
A note from our sponsor - SaaSHub
www.saashub.com | 6 Jun 2026
Index
What are some of the best open-source Avx512 projects in C++? This list will help you:
| # | Project | Stars |
|---|---|---|
| 1 | simdjson | 23,814 |
| 2 | highway | 5,597 |
| 3 | oneDNN | 3,998 |
| 4 | xsimd | 2,701 |
| 5 | Simd | 2,252 |
| 6 | less_slow.cpp | 1,913 |
| 7 | kfr | 1,879 |
| 8 | Vc | 1,533 |
| 9 | eve | 1,340 |
| 10 | libsimdpp | 1,300 |
| 11 | primesieve | 1,093 |
| 12 | x86-simd-sort | 1,012 |
| 13 | std-simd | 639 |
| 14 | toys | 375 |
| 15 | sse-popcount | 355 |
| 16 | primecount | 354 |
| 17 | md5-optimisation | 155 |
| 18 | ParallelReductionsBenchmark | 118 |
| 19 | Jsonifier | 98 |
| 20 | std_find_simd | 21 |
| 21 | modernRX | 17 |
| 22 | VectorizedKernel | 10 |
| 23 | ThinkingInSimd | 5 |