InfluxDB Platform is powered by columnar analytics, optimized for cost-efficient storage, and built with open data standards. Learn more โ
Top 23 C++ Simd Projects
-
ncnn
ncnn is a high-performance neural network inference framework optimized for the mobile platform
Project mention: AMD Funded a Drop-In CUDA Implementation Built on ROCm: It's Open-Source | news.ycombinator.com | 2024-02-12ncnn uses Vulkan for GPU acceleration, I've seen it used in a few projects to get AMD hardware support.
https://github.com/Tencent/ncnn
-
InfluxDB
Purpose built for real-time analytics at any scale. InfluxDB Platform is powered by columnar analytics, optimized for cost-efficient storage, and built with open data standards.
-
simdjson
Parsing gigabytes of JSON per second : used by Facebook/Meta Velox, the Node.js runtime, ClickHouse, WatermelonDB, Apache Doris, Milvus, StarRocks
Project mention: Wc2: Investigates optimizing 'wc', the Unix word count program | news.ycombinator.com | 2024-06-20State machines are great for complex situations, but when it comes to performance, it's not at all clear to me that they're the most scalable approach with modern systems.
The data dependency between a loop iteration for each character might be pipelined really well when executed, and we can assume large enough L1/L2 cache for our lookup tables. But we're still using at least one lookup per character.
Projects like https://github.com/simdjson/simdjson?tab=readme-ov-file#abou... are truly fascinating, because they're based on SIMD instructions that can process 64 or more bytes with a single instruction. Very much worth checking out the papers at the link above.
-
-
-
Project mention: Implementing a GPU's Programming Model on a CPU | news.ycombinator.com | 2023-10-14
This so-called GPU programming model has existed many decades before the appearance of the first GPUs, but at that time the compilers were not so good like the CUDA compilers, so the burden for a programmer was greater.
As another poster has already mentioned, there exists a compiler for CPUs which has been inspired by CUDA and which has been available for many years: ISPC (Implicit SPMD Program Compiler), at https://github.com/ispc/ispc .
NVIDIA has the very annoying habit of using a lot of terms that are different from those that have been previously used in computer science for decades. The worst is that NVIDIA has not invented new words, but they have frequently reused words that have been widely used with other meanings.
SIMT (Single-Instruction Multiple Thread) is not the worst term coined by NVIDIA, but there was no need for yet another acronym. For instance they could have used SPMD (Single Program, Multiple Data Stream), which dates from 1988, two decades before CUDA.
Moreover, SIMT is the same thing that was called "array of processes" by C.A.R. Hoare in August 1978 (in "Communicating Sequential Processes"), or "replicated parallel" by Occam in 1985 or "PARALLEL DO" by "OpenMP Fortran" in 1997-10 or "parallel for" by "OpenMP C and C++" in 1998-10.
The only (but extremely important) innovation brought by CUDA is that the compiler is smart enough so that the programmer does not need to know the structure of the processor, i.e. how many cores it has and how many SIMD lanes has each core. The CUDA compiler distributes automatically the work over the available SIMD lanes and available cores and in most cases the programmer does not care whether two executions of the function that must be executed for each data item are done on two different cores or on two different SIMD lanes of the same core.
-
-
xsimd
C++ wrappers for SIMD intrinsics and parallelized, optimized mathematical functions (SSE, AVX, AVX512, NEON, SVE))
https://github.com/xtensor-stack/xsimd
GH topics > HashMap:
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
usearch
Fast Open-Source Search & Clustering engine ร for Vectors & ๐ Strings ร in C++, C, Python, JavaScript, Rust, Java, Objective-C, Swift, C#, GoLang, and Wolfram ๐
-
StringZilla
Up to 10x faster strings for C, C++, Python, Rust, and Swift, leveraging NEON, AVX2, AVX-512, and SWAR to accelerate search, sort, edit distances, alignment scores, etc ๐ฆ
Aside from the NULL-termination requirements there is arguably another big design issue with libc strings. I believe the interfaces that may allocate memory - must give you an opportunity to override the allocator. Aside from the SIMD implementation quality and throughput on Arm, that was one of the key reasons to start a new library: https://github.com/ashvardanian/StringZilla/blob/91d0a1a02fa...
Also not a huge fan of locale controls and wchar APIs :)
-
Simd
C++ image processing and machine learning library with using of SIMD: SSE, AVX, AVX-512, AMX for x86/x64, VMX(Altivec) and VSX(Power7) for PowerPC, NEON for ARM. (by ermig1979)
-
kfr
Fast, modern C++ DSP framework, FFT, Sample Rate Conversion, FIR/IIR/Biquad Filters (SSE, AVX, AVX-512, ARM NEON)
-
DirectXMath
DirectXMath is an all inline SIMD C++ linear algebra library for use in games and graphics apps
-
-
fast_float
Fast and exact implementation of the C++ from_chars functions for number types: 4x to 10x faster than strtod, part of GCC 12, Chromium and WebKit/Safari
-
...
can_ada is just the python bindings, largely generated via pybind11.
The actual project is at https://github.com/ada-url/ada
-
-
-
-
simdutf
Unicode routines (UTF8, UTF16, UTF32) and Base64: billions of characters per second using SSE2, AVX2, NEON, AVX-512, RISC-V Vector Extension. Part of Node.js, WebKit/Safari and Bun.
IIRC all of the simdutf implementations use a lot of lookup tables except for the AVX512 and RVV backens.
Here is e.g. the NEON code: https://github.com/simdutf/simdutf/blob/1b8ca3d1072a8e2e1026...
-
-
-
-
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
C++ Simd discussion
C++ Simd related posts
-
Highway โ Portable SIMD Library
-
Usearch: Single-File Similarity Search
-
VOLK: Vector-Optimized Library of Kernels for GNU Radio
-
SIMD-accelerated computer vision on a $2 microcontroller
-
SIMD-accelerated distance functions for SQLite
-
SIMD < SIMT < SMT: Parallelism in Nvidia GPUs (2011)
-
Highway: C++ library that provides portable SIMD/vector intrinsics
-
A note from our sponsor - InfluxDB
www.influxdata.com | 7 Sep 2024
Index
What are some of the best open-source Simd projects in C++? This list will help you:
Project | Stars | |
---|---|---|
1 | ncnn | 20,054 |
2 | simdjson | 19,050 |
3 | GLM | 9,072 |
4 | highway | 4,085 |
5 | ispc | 2,469 |
6 | ozz-animation | 2,402 |
7 | xsimd | 2,146 |
8 | usearch | 2,089 |
9 | StringZilla | 2,035 |
10 | Simd | 2,034 |
11 | kfr | 1,644 |
12 | DirectXMath | 1,532 |
13 | Vc | 1,444 |
14 | fast_float | 1,346 |
15 | ada | 1,315 |
16 | SatDump | 1,293 |
17 | sse2neon | 1,285 |
18 | libsimdpp | 1,216 |
19 | simdutf | 1,086 |
20 | FastNoise2 | 981 |
21 | eve | 927 |
22 | Fastor | 734 |
23 | rtm | 707 |