ispc
sleef
Our great sponsors
ispc | sleef | |
---|---|---|
4 | 17 | |
2,405 | 589 | |
1.2% | - | |
9.5 | 8.1 | |
5 days ago | 3 days ago | |
C++ | C | |
BSD 3-clause "New" or "Revised" License | Boost Software License 1.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
ispc
-
Implementing a GPU's Programming Model on a CPU
This so-called GPU programming model has existed many decades before the appearance of the first GPUs, but at that time the compilers were not so good like the CUDA compilers, so the burden for a programmer was greater.
As another poster has already mentioned, there exists a compiler for CPUs which has been inspired by CUDA and which has been available for many years: ISPC (Implicit SPMD Program Compiler), at https://github.com/ispc/ispc .
NVIDIA has the very annoying habit of using a lot of terms that are different from those that have been previously used in computer science for decades. The worst is that NVIDIA has not invented new words, but they have frequently reused words that have been widely used with other meanings.
SIMT (Single-Instruction Multiple Thread) is not the worst term coined by NVIDIA, but there was no need for yet another acronym. For instance they could have used SPMD (Single Program, Multiple Data Stream), which dates from 1988, two decades before CUDA.
Moreover, SIMT is the same thing that was called "array of processes" by C.A.R. Hoare in August 1978 (in "Communicating Sequential Processes"), or "replicated parallel" by Occam in 1985 or "PARALLEL DO" by "OpenMP Fortran" in 1997-10 or "parallel for" by "OpenMP C and C++" in 1998-10.
The only (but extremely important) innovation brought by CUDA is that the compiler is smart enough so that the programmer does not need to know the structure of the processor, i.e. how many cores it has and how many SIMD lanes has each core. The CUDA compiler distributes automatically the work over the available SIMD lanes and available cores and in most cases the programmer does not care whether two executions of the function that must be executed for each data item are done on two different cores or on two different SIMD lanes of the same core.
-
SIMD intrinsics and the possibility of a standard library solution
ISPC: https://github.com/ispc/ispc
-
Prefix Sum with SIMD
Have you looked at [ISPC - Intel SPMD Program Compiler][0]?
[0]: https://github.com/ispc/ispc
- Duff’s Device in 2021
sleef
-
The Case of the Missing SIMD Code
I'm the main author of Highway, so I have some opinions :D Number of operations/platforms supported are important criteria.
A hopefully unbiased commentary:
Simde allows you to take existing nonportable intrinsics and get them to run on another platform. This is useful when you have a bunch of existing code and tight deadlines. The downside is less than optimal performance - a portable abstraction can be more efficient than forcing one platform to exactly match the semantics of another. Although a ton of effort has gone into Simde, sometimes it also resorts to autovectorization which may or may not work.
Eigen and SLEEF are mostly math-focused projects that also have a portability layer. SLEEF is designed for C and thus has type suffixes which are rather verbose, see https://github.com/shibatch/sleef/blob/master/src/libm/sleef... But it offers a complete (more so than Highway's) libm.
-
Does anyone have any interest in my deep-learning framework?
But the other part about SIMD: I'm unsure if mgl-mat uses SIMD for transcendental functions or even for something like element-wise multiplication and division*. SIMD easily provides a speed-boost of 4-8 times which numpy uses. Libraries like sleef have been put to use by many.
- `constexpr` what?
- Advice on porting glibc trig functions to SIMD
-
SIMD intrinsics and the possibility of a standard library solution
Highway and Agner's VectorClass also have math functions. And SLEEF should definitely be mentioned.
-
Portable SIMD library
"SIMD Library for Evaluating Elementary Functions, vectorized libm and DFT" - https://github.com/shibatch/sleef
- SIMD Library for Evaluating Elementary Functions, Vectorized Libm and DFT
-
C library for multiple-precision floating-point arithmetic with correct rounding
Not mentioned in the list of users is SLEEF (https://github.com/shibatch/sleef), which provides fast approximations for various elementary functions. (It generates coefficients for the approximations with mpfr)
SLEEF itself is used by PyTorch.
-
How to speed up array writes?
If you are looking at floats, there's https://sleef.org
-
Benchmarking sine approximations and interpolators.
It would be interesting to see SLEEF added in the benchmarks.
What are some alternatives?
highway - Performance-portable, length-agnostic SIMD with runtime dispatch
nsimd - Agenium Scale vectorization library for CPUs and GPUs
Beef - Beef Programming Language
yenten-arm-miner-yespowerr16 - ARM 64 CPU miner for Yespower variant algorithms
ParallelReductionsBenchmark - Thrust, CUB, TBB, AVX2, CUDA, OpenCL, OpenMP, SyCL - all it takes to sum a lot of numbers fast!
sb-simd - A convenient SIMD interface for SBCL.
micro-profiler - Cross-platform low-footprint realtime C/C++ Profiler
vector-libm
elena-lang - ELENA is a general-purpose language with late binding. It is multi-paradigm, combining features of functional and object-oriented programming. Rich set of tools are provided to deal with message dispatching : multi-methods, message qualifying, generic message handlers, run-time interfaces
crlibm - A mirror of the CRLibm project from INRIA Forge
lunix - Lua Unix Module.
xbyak_aarch64