Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more โ
Top 23 C++ Simd Projects
-
ncnn
ncnn is a high-performance neural network inference framework optimized for the mobile platform
-
simdjson
Parsing gigabytes of JSON per second : used by Facebook/Meta Velox, the Node.js runtime, ClickHouse, WatermelonDB, Apache Doris, Milvus, StarRocks
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
xsimd
C++ wrappers for SIMD intrinsics and parallelized, optimized mathematical functions (SSE, AVX, AVX512, NEON, SVE))
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
Simd
C++ image processing and machine learning library with using of SIMD: SSE, AVX, AVX-512, AMX for x86/x64, VMX(Altivec) and VSX(Power7) for PowerPC, NEON for ARM. (by ermig1979)
-
StringZilla
Up to 10x faster strings for C, C++, Python, Rust, and Swift, leveraging SWAR and SIMD on Arm Neon and x86 AVX2 & AVX-512-capable chips to accelerate search, sort, edit distances, alignment scores, etc ๐ฆ
-
usearch
Fast Open-Source Search & Clustering engine ร for Vectors & ๐ Strings ร in C++, C, Python, JavaScript, Rust, Java, Objective-C, Swift, C#, GoLang, and Wolfram ๐
-
kfr
Fast, modern C++ DSP framework, FFT, Sample Rate Conversion, FIR/IIR/Biquad Filters (SSE, AVX, AVX-512, ARM NEON)
-
DirectXMath
DirectXMath is an all inline SIMD C++ linear algebra library for use in games and graphics apps
-
fast_float
Fast and exact implementation of the C++ from_chars functions for number types: 4x to 10x faster than strtod, part of GCC 12 and WebKit/Safari
-
simdutf
Unicode routines (UTF8, UTF16, UTF32) and Base64: billions of characters per second using SSE2, AVX2, NEON, AVX-512, RISC-V Vector Extension. Part of Node.js and Bun.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Project mention: AMD Funded a Drop-In CUDA Implementation Built on ROCm: It's Open-Source | news.ycombinator.com | 2024-02-12ncnn uses Vulkan for GPU acceleration, I've seen it used in a few projects to get AMD hardware support.
https://github.com/Tencent/ncnn
Project mention: Tips on adding JSON output to your command line utility. (2021) | news.ycombinator.com | 2024-04-20It's also supported by simdjson [0] (which has a lot of language bindings [1]):
> Multithreaded processing of gigantic Newline-Delimited JSON (ndjson) and related formats at 3.5 GB/s
[0] https://simdjson.org/
[0] https://github.com/simdjson/simdjson?tab=readme-ov-file#bind...
Project mention: Llamafile 0.7 Brings AVX-512 Support: 10x Faster Prompt Eval Times for AMD Zen 4 | news.ycombinator.com | 2024-03-31The bf16 dot instruction replaces 6 instructions: https://github.com/google/highway/blob/master/hwy/ops/x86_12...
Project mention: Implementing a GPU's Programming Model on a CPU | news.ycombinator.com | 2023-10-14This so-called GPU programming model has existed many decades before the appearance of the first GPUs, but at that time the compilers were not so good like the CUDA compilers, so the burden for a programmer was greater.
As another poster has already mentioned, there exists a compiler for CPUs which has been inspired by CUDA and which has been available for many years: ISPC (Implicit SPMD Program Compiler), at https://github.com/ispc/ispc .
NVIDIA has the very annoying habit of using a lot of terms that are different from those that have been previously used in computer science for decades. The worst is that NVIDIA has not invented new words, but they have frequently reused words that have been widely used with other meanings.
SIMT (Single-Instruction Multiple Thread) is not the worst term coined by NVIDIA, but there was no need for yet another acronym. For instance they could have used SPMD (Single Program, Multiple Data Stream), which dates from 1988, two decades before CUDA.
Moreover, SIMT is the same thing that was called "array of processes" by C.A.R. Hoare in August 1978 (in "Communicating Sequential Processes"), or "replicated parallel" by Occam in 1985 or "PARALLEL DO" by "OpenMP Fortran" in 1997-10 or "parallel for" by "OpenMP C and C++" in 1998-10.
The only (but extremely important) innovation brought by CUDA is that the compiler is smart enough so that the programmer does not need to know the structure of the processor, i.e. how many cores it has and how many SIMD lanes has each core. The CUDA compiler distributes automatically the work over the available SIMD lanes and available cores and in most cases the programmer does not care whether two executions of the function that must be executed for each data item are done on two different cores or on two different SIMD lanes of the same core.
https://github.com/xtensor-stack/xsimd
GH topics > HashMap:
I was curious about these libraries a few weeks ago and did some searching. Is there one that's got a clearly dominating set of users or contributors?
I don't know what a good way to compare these might be, other than perhaps activity/contributor count.
[1] https://github.com/simd-everywhere/simde
[2] https://github.com/ermig1979/Simd
[3] https://github.com/google/highway
[4] https://gitlab.com/libeigen/eigen
[5] https://github.com/shibatch/sleef
Project mention: Measuring energy usage: regular code vs. SIMD code | news.ycombinator.com | 2024-02-19The 3.5x energy-efficiency gap between serial and SIMD code becomes even larger when
A. you do byte-level processing instead of float words;
B. you use embedded, IoT, and other low-energy devices.
A few years ago I've compared Nvidia Jetson Xavier (long before the Orin release), Intel-based MacBook Pro with Core i9, and AVX-512 capable CPUs on substring search benchmarks.
On Xavier one can quite easily disable/enable cores and reconfigure power usage. At peak I got to 4.2 GB/J which was an 8.3x improvement in inefficiency over LibC in substring search operations. The comparison table is still available in the older README: https://github.com/ashvardanian/StringZilla/tree/v2.0.2?tab=...
Project mention: USearch SQLite Extensions for Vector and Text Search | news.ycombinator.com | 2024-02-22
...
can_ada is just the python bindings, largely generated via pybind11.
The actual project is at https://github.com/ada-url/ada
A note on using GLRPT - currently there aren't any satellites up with functioning LRPT transmissions. Hopefully this year Russia will get another Meteor satellite up and we can start receiving LRPT signals again. Also you may find it more convenient to use SatDump for working some satellites as it connects directly to the SDR without needing GQRX and a virtual audio cable.
C++ Simd related posts
- Glibc Buffer Overflow in Iconv
- Tips on adding JSON output to your command line utility. (2021)
- Llamafile 0.7 Brings AVX-512 Support: 10x Faster Prompt Eval Times for AMD Zen 4
- Training great LLMs from ground zero in the wilderness as a startup
- JPEG XL and the Pareto Front
- Gemma.cpp: lightweight, standalone C++ inference engine for Gemma models
- USearch SQLite Extensions for Vector and Text Search
-
A note from our sponsor - InfluxDB
www.influxdata.com | 23 Apr 2024
Index
What are some of the best open-source Simd projects in C++? This list will help you:
Project | Stars | |
---|---|---|
1 | ncnn | 19,176 |
2 | simdjson | 18,362 |
3 | GLM | 8,653 |
4 | highway | 3,623 |
5 | ispc | 2,402 |
6 | ozz-animation | 2,245 |
7 | xsimd | 2,024 |
8 | Simd | 1,971 |
9 | StringZilla | 1,776 |
10 | usearch | 1,629 |
11 | kfr | 1,582 |
12 | DirectXMath | 1,482 |
13 | Vc | 1,417 |
14 | fast_float | 1,269 |
15 | sse2neon | 1,220 |
16 | ada | 1,194 |
17 | libsimdpp | 1,188 |
18 | SatDump | 1,138 |
19 | simdutf | 948 |
20 | FastNoise2 | 897 |
21 | eve | 843 |
22 | Klein | 729 |
23 | Fastor | 699 |
Sponsored