Open-source projects categorized as Simd | Edit details
Related topics: #Neon #Avx #Avx2 #Avx512 #Sse

Top 23 Simd Open-Source Projects

  • GitHub repo simdjson

    Parsing gigabytes of JSON per second

    Project mention: How many x86 instructions are there? | news.ycombinator.com | 2021-04-21

    PMOVMSKB is a great instruction, and 3c understates how cheap it is - if you have a throughput problem (rather than a latency problem) it's even more efficient relative to the ARM equivalent.

    I have a blog post about coping strategies for working around the absence of PMOVMSKB on NEON:


    We used these techniques in simdjson (which I presume still uses them; the code has changed considerably since I built this): https://github.com/simdjson/simdjson

    The best techniques for mitigating the absence of PMOVMSKB require that you use LD4, which results in interleaved inputs. This can sometimes make things easier, sometimes harder for your underlying lexing algorithm - sadly, it's not a 1:1 transformation of the original x86 code.

  • GitHub repo ncnn

    ncnn is a high-performance neural network inference framework optimized for the mobile platform

    Project mention: Deep Learning options on Radeon RX 6800 | reddit.com/r/Amd | 2021-04-16

    There's a Tencent-developed Open Source CNN library that runs on pretty much anything, as it's using Vulkan. It's called ncnn, you might want to take a look.

  • GitHub repo GLM

    OpenGL Mathematics (GLM)

    Project mention: SIMD for C++ Developers [pdf] | reddit.com/r/cpp | 2021-04-28

    It seems https://github.com/g-truc/glm also supports SIMD (at least if used / configured correctly).

  • GitHub repo john

    John the Ripper jumbo - advanced offline password cracker, which supports hundreds of hash and cipher types, and runs on many operating systems, CPUs, GPUs, and even some FPGAs

    Project mention: Decrypting an encrypted PDF without password? | reddit.com/r/Hacking_Tutorials | 2021-05-01

    Download and install gnupg for windows if on Windows or if your in Linux it's probably installed if not install it with your package manager Download John the ripper from here. Download Perl from here (depending in your os you might have it pre-installed but if your in Windows download strawberry Perl.) Open cmd or terminal and write: gpg --receive-keys 05C027FD4BDC136E gpg --verify your-downloaded-john-signature.sign If it says that the signature is correct it should be ok. If not download John again. Ignore any error like public key not signed. Now do: 7z X your-compressed-john-binary.tar.gz 7z X your-compressed-john-binary.tar cd where-the-binaries-extracted cd the-only-folder-you-see cd run cpan install Exif::Tools perl pdf2john.pl name-of-your-encrypted-pdf.pdf > hashes_to_crack john hashes_to_crack It should start to crack. Keep in mind it will probably take a lot of time to crack the hash.

  • Project mention: Best way to learn ECS and DOTS | reddit.com/r/Unity3D | 2021-04-20

    Anyways, enough of my endless complaints about unity and back to helping you. If you cant find any faster way to understanding DOTS, start where I started with the ECS samples (DOTS used to be called Unity ECS).

  • GitHub repo oneDNN

    oneAPI Deep Neural Network Library (oneDNN)

    Project mention: Is gpu hardware tied to cpu ISA ? | reddit.com/r/hardware | 2021-01-11

    Intel are trying to support their oneAPI compute framework on Arm and IBM POWER and z/Architecture (s390x) but since they ever released only a single discrete GPU with the Xe architecture it's unclear whether they'll support Xe GPU compute on e.g. ARM https://github.com/oneapi-src/oneDNN

  • GitHub repo simde

    Implementations of SIMD instruction sets for systems which don't natively support them.

    Project mention: Adobe Photoshop Ships on Macs Apple Silicon/M1 – 50% Faster | news.ycombinator.com | 2021-03-12

    > architecture-specific features such as SSE/AVX which is not portable.

    I don’t have hands-on experience, but somewhere on HN I saw this: https://github.com/simd-everywhere/simde If starting a new cross-platform project today, I would try that library first, before doing the usual intrinsics.

  • GitHub repo Vc

    SIMD Vector Classes for C++

    Project mention: All C++20 core language features with examples | news.ycombinator.com | 2021-04-07

    > - Waiting for Cross-Platform standardized SIMD vector datatypes

    which language has standardized SIMD vector datatypes ? most languages don't even have any ability to express SIMD while in C++ I can just use Vc (https://github.com/VcDevel/Vc), nsimd (https://github.com/agenium-scale/nsimd) or one of the other ton of alternatives, and have stuff that JustWorksTM on more architectures than most languages even support

    - Using nonstandard extensions, libraries or home-baked solutions to run computations in parallel on many cores or on different processors than the CPU

    what are the other native languages with a standardized memory model for atomics ? and, what's the problem with using libraries ? it's not like you're going to use C# or Java's built-in threadpools if you are doing any serious work, no ? Do they even have something as easy to use as https://github.com/taskflow/taskflow ?

    - Debugging cross-platform code using couts, cerrs and printfs

    because people never use console.log in JS or System.println in C# maybe ?

    - Forced to use boost for even quite elementary operations on std::strings.

    can you point to non-trivial java projects that do not use Apache Commons ? Also, the boost string algorithms are header-only so you will end up with exactly the same binaries that if it was in some std::string_algorithms namespace:


  • GitHub repo cglm

    📽 Highly Optimized Graphics Math (glm) for C

    Project mention: Using CGLM, why does the value of this vec4 affect the mat4? | reddit.com/r/C_Programming | 2021-04-27

    It has been a while since I've written any C so it's very likely I'm missing something very simple. But I'm using CGLM and I have this simple program:

  • GitHub repo DirectXMath

    DirectXMath is an all inline SIMD C++ linear algebra library for use in games and graphics apps

    Project mention: SIMD for C++ Developers [pdf] | news.ycombinator.com | 2021-04-27

    For videogame applications, look there before writing these intrinsics: https://github.com/microsoft/DirectXMath/ That library already implements a lot of complicated things, relatively well.

    Here’s for frustum culling https://github.com/microsoft/DirectXMath/blob/jan2021/Inc/Di... Relatively inefficient when you have many boxes to test against same frustum, but (a) compiler may inline and optimize (b) failing that, it’s easy to copy-paste and optimize manually, compute these 6 planes and call BoundingBox::ContainedBy method yourself.

  • GitHub repo XNNPACK

    High-efficiency floating-point neural network inference operators for mobile, server, and Web

    Project mention: Where are Nvidia's DLSS models stored and how big are they? | reddit.com/r/hardware | 2021-03-28

    It's quite simple. https://github.com/google/XNNPACK for example.

  • GitHub repo cgmath-rs

    A linear algebra and mathematics library for computer graphics.

    Project mention: Rendering large 3D tilemaps with a single draw call at 3000 FPS | reddit.com/r/gamedev | 2021-04-06

    One great thing about Rust is that the library ecosystem is surprisingly mature, especially considering how young the language is (1.0 was released in 2015). C# also has good libraries, but from my experience it's kinda fiddly to use most open source libraries with Unity, at least without modifications. Rust's ecosystem has some excellent libraries that help with game development, such as noise for procedural generation and cgmath for linear algebra.

  • GitHub repo uwu

    fastest text uwuifier in the west

    Project mention: uwuify - fastest test uwuifier in the west: simd vectorized and multithreaded command-line tool for text uwu-ing at the speed of simply copying a file | reddit.com/r/ProgrammerAnimemes | 2021-04-06
  • GitHub repo TurboPFor

    Fastest Integer Compression

    Project mention: C Deep | dev.to | 2021-02-27

    TurboPFor - Fastest integer compression. GPL-2.0-or-later

  • GitHub repo Klein

    P(R*_{3, 0, 1}) specialized SIMD Geometric Algebra Library

  • GitHub repo pysimdjson

    Python bindings for the simdjson project.

    Project mention: How I cut GTA Online loading times by 70% | reddit.com/r/programming | 2021-02-28

    I don't think JSON is really the problem - parsing 10MB of JSON is not so slow. For example, using Python's json.load takes about 800ms for a 47MB file on my system, using something like simdjson cuts that down to ~70ms.

  • GitHub repo stdarch

    Rust's standard library vendor-specific APIs and run-time feature detection

    Project mention: Incredibly fast UTF-8 validation | reddit.com/r/rust | 2021-04-21

    You can check the code. Apparently the std implementation uses the OSXSAVE register to check that the OS supports saving AVX/AVX2 registers during context switches and only then enables it. In a non-std context one might not generally be able to depend on the OSXSAVE register.

  • GitHub repo SeqAn

    SeqAn's official repository.

  • GitHub repo sleef

    SIMD Library for Evaluating Elementary Functions, vectorized libm and DFT

    Project mention: Why are C++ trigonometric functions fast, but not that fast ? | reddit.com/r/cpp | 2021-04-24

    There are some wrapper libraries for intrinsics you can use alternatively that provide similar or the same functionality. For example you can look into vectorclass or sleef. I picked these two as they are sometimes recommended but I am not familiar with them as at work a custom one is used.

  • GitHub repo LoopVectorization.jl

    Macro(s) for vectorizing loops.

    Project mention: Julia 1.6 Highlights | news.ycombinator.com | 2021-03-25

    Very often benchmarks include compilation time of julia, which might be slow. Sometimes they rightfully do so, but often it's really apples and oranges when benchmarking vs C/C++/Rust/Fortran. Julia 1.6 shows compilation time in the `@time f()` macro, but Julia programmers typically use @btime from the BenchmarkTools package to get better timings (e.g. median runtime over n function calls).

    I think it's more interesting to see what people do with the language instead of focusing on microbenchmarks. There's for instance this great package https://github.com/JuliaSIMD/LoopVectorization.jl which exports a simple macro `@avx` which you can stick to loops to vectorize them in ways better than the compiler (=LLVM). It's quite remarkable you can implement this in the language as a package as opposed to having LLVM improve or the julia compiler team figure this out.

    See the docs which kinda read like blog posts: https://juliasimd.github.io/LoopVectorization.jl/stable/

  • GitHub repo sse2neon

    A translator from Intel SSE intrinsics to Arm/Aarch64 NEON implementation

    Project mention: Success porting VCV into aarch64 linux! (Usable on Android Devices) | reddit.com/r/vcvrack | 2021-03-13

    You should go to /include/simd and download sse2neon.h into the folder. Replace appearing in any source files in that directory with "sse2neon.h". You will still encounter errors; remove the lines causing problems, typically containing the phrase ZERO_MODE. ARM processors does not require it.

  • GitHub repo highway

    Performance-portable, length-agnostic SIMD with runtime dispatch

    Project mention: ARM vs. RISC-V Vector Extensions | news.ycombinator.com | 2021-05-06

    > If your goal is to understand how hardware SIMD works, you're probably better off sticking to C code with intrinsics

    Agreed, and we're also using intrinsics in time-critical places. I am confident we will be able to hide both SVE and RVV behind the same C++ interface (https://github.com/google/highway) - works for RVV, just started SVE.

  • GitHub repo Fastor

    A lightweight high performance tensor algebra framework for modern C++

    Project mention: Scientific computing in Cpp | reddit.com/r/cpp | 2021-02-18

    Tensorflow, Machine learning: https://www.tensorflow.org/ Fastor, A tensor library: https://github.com/romeric/Fastor GNU Scientific Library(GSL): https://www.gnu.org/software/gsl/ Boost. FEniCS, A finite element library: https://fenicsproject.org/ Intel MKL, a BLAS+LAPACK+other goodies library: https://software.intel.com/content/www/us/en/develop/tools/math-kernel-library.html SuiteSparse, A sparse linear algebra library: http://faculty.cse.tamu.edu/davis/suitesparse.html Sundials, Nonlinear solvers: https://computing.llnl.gov/projects/sundials

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2021-05-06.


What are some of the best open-source Simd projects? This list will help you:

Project Stars
1 simdjson 13,277
2 ncnn 11,486
3 GLM 5,096
4 john 4,866
5 EntityComponentSystemSamples 3,982
6 oneDNN 2,348
7 simde 1,056
8 Vc 1,008
9 cglm 936
10 DirectXMath 878
11 XNNPACK 846
12 cgmath-rs 799
13 uwu 788
14 TurboPFor 530
15 Klein 476
16 pysimdjson 446
17 stdarch 412
18 SeqAn 395
19 sleef 366
20 LoopVectorization.jl 346
21 sse2neon 342
22 highway 336
23 Fastor 299