SaaSHub helps you find the best software and product alternatives Learn more โ
Top 23 C++ Simd Projects
-
ncnn
ncnn is a high-performance neural network inference framework optimized for the mobile platform
-
InfluxDB
InfluxDB โ Built for High-Performance Time Series Workloads. InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.
-
simdjson
Parsing gigabytes of JSON per second : used by Facebook/Meta Velox, the Node.js runtime, ClickHouse, WatermelonDB, Apache Doris, Milvus, StarRocks
Project mention: Make Ubuntu packages 90% faster by rebuilding them | news.ycombinator.com | 2025-03-18I think parsing once into a faster format (sqlite3 or parquet) would be more beneficial.
https://simdjson.org/
-
As for math, that was the easiest choice as of yet. No doubt, GLM is a "gold standard" at this point. For OpenGL it is, at least. But, like with a lot of the other APIs, I decided to build a wrapper around it rather than directly reference the library in the engine's code. And for physics, well, I had not come upon that answer just yet. I did try to make my own physics logic at some point. And while it was, surprisingly, successful, I wanted more than just a simple physics layer. I wanted something more complex and, more importantly, faster than my implementation. I have not decided upon a physics library yet. But I'll cross that bridge when I come to it.
-
I quite like highway.
As mentioned, last time I tried vqsort for RVV it was surprisingly slow.
I tried to replicate it yesterday, but noticed that vqsort is now disabled for RVV: https://github.com/google/highway/blob/400fbf20f2e40b984be12...
Does highway support sorting networks for non-128-bit vector registers?
When I tried to compile it for AVX512, the BaseCase seems to only use xmm registers: https://godbolt.org/z/qr9xoTGKn
-
usearch
Fast Open-Source Search & Clustering engine ร for Vectors & ๐ Strings ร in C++, C, Python, JavaScript, Rust, Java, Objective-C, Swift, C#, GoLang, and Wolfram ๐
I wish I'd had a short answer :)
For years, I've had a hope to build it in the form of an open-core project: open-source SotA solutions for Storage, Compute, and AI Modeling built bottom up. You can imagine the financial & time burden of building something like that with all the weird optimizations and coding practices listed above.
A few years in, with millions spent out of my pocket, without any venture support or revenue, I've decided to change gears and focus on a few niche workloads until some of the Unum tools become industry standards for something. USearch was precisely that, a toy Vector Search engine that would still, hopefully, be 10x better than alternatives, in one way or another: <https://www.unum.cloud/blog/2023-11-07-scaling-vector-search...>.
Now, ScyllaDB (through Rust SDK) and YugaByte (through C++ SDK) are the most recent DBMSs to announce features built on USearch, joining the ranks of many other tech products leveraging some of those optimizations, and I was playing around with different open-source growth & governance ideas last year, looking for way to organize more collaborative environment among our upstream users, rather than competitive โ no major releases, just occasional patches here and there.
It was an interesting period, but now I'm again deep in the "CUDA-GDB" land, and the next major release to come is precisely around Full-Text Search in StringZilla <https://github.com/ashvardanian/stringzilla>, and will be integrated into both USearch <https://github.com/unum-cloud/usearch> and somewhere else ;)
-
-
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
xsimd
C++ wrappers for SIMD intrinsics and parallelized, optimized mathematical functions (SSE, AVX, AVX512, NEON, SVE))
Thanks, that's an important caveat!
> Meanwhile xsimd (https://github.com/xtensor-stack/xsimd) has the feature level as a template parameter on its vector objects
That's pretty cool because you can write function templates and instantiate different versions that you can select at runtime!
-
Simd
C++ image processing and machine learning library with using of SIMD: SSE, AVX, AVX-512, AMX for x86/x64, NEON for ARM. (by ermig1979)
-
proton
High-performance, low-footprint SQL database written in C++. Process millions of rows per second from Kafka/Pulsar, Iceberg, or ClickHouse, and seamlessly write results back. Supports powerful features like JOIN, CDC, UPSERT, and LOOKUP, enabling real-time analytics and ETL at scale. (by timeplus-io)
Project mention: Show HN: Open-Source C++ Apache Iceberg Client with Write Support | news.ycombinator.com | 2025-03-20 -
fast_float
Fast and exact implementation of the C++ from_chars functions for number types: 4x to 10x faster than strtod, part of GCC 12, Chromium, Redis and WebKit/Safari
-
kfr
Fast, modern C++ DSP framework, FFT, Sample Rate Conversion, FIR/IIR/Biquad Filters (SSE, AVX, AVX-512, ARM NEON)
-
DirectXMath
DirectXMath is an all inline SIMD C++ linear algebra library for use in games and graphics apps
-
-
ada
WHATWG-compliant and fast URL parser written in modern C++, part of Node.js, Clickhouse, Redpanda, Kong, Telegram, Datadog and Cloudflare Workers.
-
Project mention: Understanding SIMD: Infinite Complexity of Trivial Problems | news.ycombinator.com | 2024-11-30
I'm surprised no one has mentioned Vc. I found ispc clunky and not as performant, and std::simd didn't support some useful math ops like rsqrt. Vc has been around for years, I have no trouble including it in my codes, it has masking and many of the most useful math ops, and I can get over 1 TF/s on a consumer-grade Ryzen and at least 3 TF/s on the big Epyc CPUs.
https://github.com/VcDevel/Vc
-
-
simdutf
Unicode routines (UTF8, UTF16, UTF32) and Base64: billions of characters per second using SSE2, AVX2, NEON, AVX-512, RISC-V Vector Extension, LoongArch64, POWER. Part of Node.js, WebKit/Safari, Ladybird, Chromium, Cloudflare Workers and Bun.
Lemire and collaborators often write in C++ intrinsics, or thin platform-specific wrappers on top of them.
I count ~8 different implementations [1], which demonstrates considerable commitment :) Personally, I prefer to write once with portable intrinsics.
https://github.com/simdutf/simdutf/tree/1d5b5cd2b60850954df5...
-
-
Here is a bunch of simple examples: https://github.com/jfalcou/eve/blob/fb093a0553d25bb8114f1396...
I personally think we have the following strenghs:
* Algorithms. Writing SIMD loops is very hard. We give you a lot of ready to go loops. (find, search, remove, set_intersection to name a few).
-
-
-
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
C++ Simd discussion
C++ Simd related posts
-
Three Fundamental Flaws of SIMD
-
Show HN: Less Slow C++
-
AWS Graviton 3 > Graviton 4 for Vector Similarity Search
-
Realtime Math: C++11 alternative to glm and DirectX Math
-
Purely Functional Sliding Window Aggregation Algorithm
-
C Is Not Suited to SIMD
-
Expressive Vector Engine โ SIMD in C++
-
A note from our sponsor - SaaSHub
www.saashub.com | 14 May 2025
Index
What are some of the best open-source Simd projects in C++? This list will help you:
# | Project | Stars |
---|---|---|
1 | ncnn | 21,437 |
2 | simdjson | 20,284 |
3 | GLM | 9,884 |
4 | highway | 4,615 |
5 | usearch | 2,706 |
6 | ispc | 2,657 |
7 | ozz-animation | 2,571 |
8 | xsimd | 2,364 |
9 | Simd | 2,148 |
10 | proton | 1,788 |
11 | fast_float | 1,764 |
12 | kfr | 1,736 |
13 | DirectXMath | 1,643 |
14 | SatDump | 1,575 |
15 | ada | 1,532 |
16 | Vc | 1,483 |
17 | sse2neon | 1,378 |
18 | simdutf | 1,366 |
19 | libsimdpp | 1,273 |
20 | eve | 1,188 |
21 | FastNoise2 | 1,135 |
22 | hlslpp | 857 |
23 | Fastor | 779 |