blis vs DirectXMath

Our great sponsors

InfluxDB - Power Real-Time Data Analytics at Scale

WorkOS - The modern identity platform for B2B SaaS

SaaSHub - Software Alternatives and Reviews

Our great sponsors

blis		DirectXMath
	Project
17	Mentions	13
2,079	Stars	1,481
4.3%	Growth	1.5%
7.1	Activity	6.8
16 days ago	Latest Commit	20 days ago
C	Language	C++
GNU General Public License v3.0 or later	License	MIT License

The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

blis

Posts with mentions or reviews of blis. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2024-04-24.

Faer-rs: Linear algebra foundation for the Rust programming language
7 projects | news.ycombinator.com | 24 Apr 2024

BLIS is an interesting new direction in that regard: https://github.com/flame/blis
>The BLAS-like Library Instantiation Software (BLIS) framework is a new infrastructure for rapidly instantiating Basic Linear Algebra Subprograms (BLAS) functionality. Its fundamental innovation is that virtually all computation within level-2 (matrix-vector) and level-3 (matrix-matrix) BLAS operations can be expressed and optimized in terms of very simple kernels.
Optimize sgemm on RISC-V platform
6 projects | news.ycombinator.com | 28 Feb 2024

There is a recent update to the blis alternative to BLAS that includes a number of RISC-V performance optimizations.
https://github.com/flame/blis/pull/737
BLIS: Portable basis for high-performance BLAS-like linear algebra libs
2 projects | news.ycombinator.com | 24 Jan 2024

https://github.com/flame/blis/blob/master/docs/Performance.m...
It seems that the selling point is that BLIS does multi-core quite well. I am especially impressed that it does as well as the highly optimized Intel's MKL on Intel's CPUs.
I do not see the selling point of BLIS-specific APIs, though. The whole point of having an open BLAS API standard is that numerical libraries should be drop-in replaceable, so when a new library (such as BLIS here) comes along, one could just re-link the library and reap the performance gain immediately.
What is interesting is that numerical algebra work, by nature, is mostly embarrassingly parallel, so it should not be too difficult to write multi-core implementations. And yet, BLIS here performs so much better than some other industry-leading implementations on multi-core configurations. So the question is not why BLIS does so well; the question is why some other implementations do so poorly.
Benchmarking 20 programming languages on N-queens and matrix multiplication
15 projects | news.ycombinator.com | 2 Jan 2024

First we can use Laser, which was my initial BLAS experiment in 2019. At the time in particular, OpenBLAS didn't properly use the AVX512 VPUs. (See thread in BLIS https://github.com/flame/blis/issues/352 ), It has made progress since then, still, on my current laptop perf is in the same range
Reproduction:
The Art of High Performance Computing
4 projects | news.ycombinator.com | 30 Dec 2023

https://github.com/flame/blis/
Field et al, recent winners of the James H. Wilkinson Prize for Numerical Software.
Field and Goto both worked with Robert van de Geijn. Lots of TACC interaction in that broader team.
[D] Which BLAS library to choose for apple silicon?
2 projects | /r/MachineLearning | 24 May 2023

BLIS is fine too~ https://github.com/flame/blis
Column Vectors vs. Row Vectors
1 project | news.ycombinator.com | 27 Oct 2022

Here's BLIS's object API:
https://github.com/flame/blis/blob/master/docs/BLISObjectAPI...
Searching "object" in BLIS's README (https://github.com/flame/blis) to see what they think of it:
"Objects are relatively lightweight structs and passed by address, which helps tame function calling overhead."
"This is API abstracts away properties of vectors and matrices within obj_t structs that can be queried with accessor functions. Many developers and experts prefer this API over the typed API."
In my opinion, this API is a strict improvement over BLAS. I do not think there is any reason to prefer the old BLAS-style API over an improvement like this.
Regarding your own experience, it's great that using BLAS proved to be a valuable learning experience for you. But your argument that the BLAS API is somehow uniquely helpful in terms of learning how to program numerical algorithms efficiently, or that it will help you avoid performance problems, is not true. It is possible to replace the BLAS API with a more modern and intuitive API with the same benefits. To be clear, the benefits here are direct memory management and control of striding and matrix layout, which create opportunities for optimization. There is nothing unique about BLAS in this regard---it's possible to learn these lessons using any of the other listed options if you're paying attention and being systematic.
BLIS: Portable software framework for high-performance linear algebra
1 project | news.ycombinator.com | 17 Aug 2022
Small Neural networks in Julia 5x faster than PyTorch
8 projects | news.ycombinator.com | 14 Apr 2022

The article asks "Which Micro-optimizations matter for BLAS3?", implying small dimensions, but doesn't actually tell me. The problem is well-studied, depending on what you consider "small". The most important thing is to avoid the packing step below an appropriate threshold. Implementations include libxsmm, blasfeo, and the "sup" version in blis (with papers on libxsmm and blasfeo). Eigen might also be relevant.
https://libxsmm.readthedocs.io/
https://blasfeo.syscop.de/
https://github.com/flame/blis
Eigen: A C++ template library for linear algebra
6 projects | news.ycombinator.com | 4 Apr 2022

DirectXMath

Posts with mentions or reviews of DirectXMath. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-04-15.

Vector math library benchmarks (C++)
3 projects | /r/GraphicsProgramming | 15 Apr 2023

For those unfamiliar, like I was, DXM is DirectXMath.
Learning DirectX 12 in 2023
13 projects | dev.to | 30 Jan 2023

Alongside MiniEngine, you’ll want to look into the DirectX Toolkit. This is a set of utilities by Microsoft that simplify graphics and game development. It contains libraries like DirectXMesh for parsing and optimizing meshes for DX12, or DirectXMath which handles 3D math operations like the OpenGL library glm. It also has utilities for gamepad input or sprite fonts. You can see a list of the headers here to get an idea of the features. You’ll definitely want to include this in your project if you don’t want to think about a lot of these solved problems (and don’t have to worry about cross-platform support).
Optimizing compilers reload vector constants needlessly
7 projects | news.ycombinator.com | 6 Dec 2022

Bad news. For SIMD there are not cross-platform intrinsics. Intel intrinsics map directly to SSE/AVX instructions and ARM intrinsics map directly to NEON instructions.
For cross-platform, your best bet is probably https://github.com/VcDevel/std-simd
There's https://eigen.tuxfamily.org/index.php?title=Main_Page But, it's tremendously complicated for anything other than large-scale linear algebra.
And, there's https://github.com/microsoft/DirectXMath But, it has obvious biases :P
MATHRIL - Custom math library for game programming
3 projects | /r/cpp | 6 Jul 2022

I am not in gamedev, but work with 3D graphics, we use DirectX 11, so DirectXMath was a natural choice, it is header only, it supports SIMD instructions (SSE, AVX, NEON etc.), it can even be used on Linux (has no dependence on Windows). It of course just one choice: https://github.com/Microsoft/DirectXMath.
When i had to look up what a Quaternion is
2 projects | /r/ProgrammerHumor | 5 Jul 2022
Eigen: A C++ template library for linear algebra
6 projects | news.ycombinator.com | 4 Apr 2022

I never really used GLM, but Eigen was substantially slower than DirectXMath https://github.com/microsoft/DirectXMath for these things. Despite the name, 99% of that library is OS agnostic, only a few small pieces (like projection matrix formula) are specific to Direct3D. When enabled with corresponding macros, inline functions from that library normally compile into pretty efficient manually vectorized SSE, AVX or NEON code.
The only major issue, DirectXMath doesn’t support FP64 precision.
maths - templated c++ linear algebra library with vector swizzling, intersection tests and useful functions for games and graphics dev... includes live webgl/wasm demo ?
3 projects | /r/cpp | 12 Jan 2022

If you’re the author, consider comparisons with the industry standards, glm and DirectXMath, which both ensure easy interoperability with the two graphics APIs.
Algorithms for division: Using Newton's method
1 project | news.ycombinator.com | 8 Dec 2021

Good article, but note that if the hardware supports the division instruction, will be much faster than the described workarounds.
Personally, I recently did what’s written in 2 cases: FP32 division on ARMv7, and FP64 division on GPUs who don’t support that instruction.
For ARM CPUs, not only they have FRECPE, they also have FRECPS for the iteration step. An example there: https://github.com/microsoft/DirectXMath/blob/jan2021/Inc/Di...
For GPUs, Microsoft classified FP64 division as “extended double shader instruction” and the support is optional. However, GPUs are guaranteed to support FP32 division. The result of FP32 division provides an awesome starting point for Newton-Raphson refinement in FP64 precision.
Use of BLAS vs direct SIMD for linear algebra library operations?
3 projects | /r/cpp | 28 Aug 2021

For graphics DX math is a very good library.
Speeding Up `Atan2f` by 50x
7 projects | news.ycombinator.com | 17 Aug 2021

I wonder how does it compare with Microsoft’s implementation, there: https://github.com/microsoft/DirectXMath/blob/jan2021/Inc/Di...
Based on the code your version is probably much faster. It would be interesting to compare precision still, MS uses 17-degree polynomial there.

What are some alternatives?

When comparing blis and DirectXMath you can also consider the following projects:

tiny-cuda-nn - Lightning fast C++/CUDA neural network framework

GLM - OpenGL Mathematics (GLM)

vectorflow

highway - Performance-portable, length-agnostic SIMD with runtime dispatch

sundials - Official development repository for SUNDIALS - a SUite of Nonlinear and DIfferential/ALgebraic equation Solvers. Pull requests are welcome for bug fixes and minor changes.

libjxl - JPEG XL image format reference implementation

xtensor - C++ tensors with broadcasting and lazy computing

Fastor - A lightweight high performance tensor algebra framework for modern C++

how-to-optimize-gemm

glibc - GNU Libc

diffrax - Numerical differential equation solvers in JAX. Autodifferentiable and GPU-capable. https://docs.kidger.site/diffrax/

Vc - SIMD Vector Classes for C++

blis vs tiny-cuda-nn DirectXMath vs GLM blis vs vectorflow DirectXMath vs highway blis vs sundials DirectXMath vs libjxl blis vs xtensor DirectXMath vs Fastor blis vs how-to-optimize-gemm DirectXMath vs glibc blis vs diffrax DirectXMath vs Vc

Compare blis vs DirectXMath and see what are their differences.

blis

DirectXMath

blis

DirectXMath

What are some alternatives?