blis vs RecursiveFactorization

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

blis		RecursiveFactorization
	Project
17	Mentions	3
2,107	Stars	-
3.5%	Growth	-
7.0	Activity	-
2 days ago	Latest Commit	-
C	Language
GNU General Public License v3.0 or later	License	-

The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

blis

Posts with mentions or reviews of blis. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2024-04-24.

Faer-rs: Linear algebra foundation for the Rust programming language
8 projects | news.ycombinator.com | 24 Apr 2024

BLIS is an interesting new direction in that regard: https://github.com/flame/blis
>The BLAS-like Library Instantiation Software (BLIS) framework is a new infrastructure for rapidly instantiating Basic Linear Algebra Subprograms (BLAS) functionality. Its fundamental innovation is that virtually all computation within level-2 (matrix-vector) and level-3 (matrix-matrix) BLAS operations can be expressed and optimized in terms of very simple kernels.
Optimize sgemm on RISC-V platform
6 projects | news.ycombinator.com | 28 Feb 2024

There is a recent update to the blis alternative to BLAS that includes a number of RISC-V performance optimizations.
https://github.com/flame/blis/pull/737
BLIS: Portable basis for high-performance BLAS-like linear algebra libs
2 projects | news.ycombinator.com | 24 Jan 2024

https://github.com/flame/blis/blob/master/docs/Performance.m...
It seems that the selling point is that BLIS does multi-core quite well. I am especially impressed that it does as well as the highly optimized Intel's MKL on Intel's CPUs.
I do not see the selling point of BLIS-specific APIs, though. The whole point of having an open BLAS API standard is that numerical libraries should be drop-in replaceable, so when a new library (such as BLIS here) comes along, one could just re-link the library and reap the performance gain immediately.
What is interesting is that numerical algebra work, by nature, is mostly embarrassingly parallel, so it should not be too difficult to write multi-core implementations. And yet, BLIS here performs so much better than some other industry-leading implementations on multi-core configurations. So the question is not why BLIS does so well; the question is why some other implementations do so poorly.
Benchmarking 20 programming languages on N-queens and matrix multiplication
15 projects | news.ycombinator.com | 2 Jan 2024

First we can use Laser, which was my initial BLAS experiment in 2019. At the time in particular, OpenBLAS didn't properly use the AVX512 VPUs. (See thread in BLIS https://github.com/flame/blis/issues/352 ), It has made progress since then, still, on my current laptop perf is in the same range
Reproduction:
The Art of High Performance Computing
4 projects | news.ycombinator.com | 30 Dec 2023

https://github.com/flame/blis/
Field et al, recent winners of the James H. Wilkinson Prize for Numerical Software.
Field and Goto both worked with Robert van de Geijn. Lots of TACC interaction in that broader team.
[D] Which BLAS library to choose for apple silicon?
2 projects | /r/MachineLearning | 24 May 2023

BLIS is fine too~ https://github.com/flame/blis
Column Vectors vs. Row Vectors
1 project | news.ycombinator.com | 27 Oct 2022

Here's BLIS's object API:
https://github.com/flame/blis/blob/master/docs/BLISObjectAPI...
Searching "object" in BLIS's README (https://github.com/flame/blis) to see what they think of it:
"Objects are relatively lightweight structs and passed by address, which helps tame function calling overhead."
"This is API abstracts away properties of vectors and matrices within obj_t structs that can be queried with accessor functions. Many developers and experts prefer this API over the typed API."
In my opinion, this API is a strict improvement over BLAS. I do not think there is any reason to prefer the old BLAS-style API over an improvement like this.
Regarding your own experience, it's great that using BLAS proved to be a valuable learning experience for you. But your argument that the BLAS API is somehow uniquely helpful in terms of learning how to program numerical algorithms efficiently, or that it will help you avoid performance problems, is not true. It is possible to replace the BLAS API with a more modern and intuitive API with the same benefits. To be clear, the benefits here are direct memory management and control of striding and matrix layout, which create opportunities for optimization. There is nothing unique about BLAS in this regard---it's possible to learn these lessons using any of the other listed options if you're paying attention and being systematic.
BLIS: Portable software framework for high-performance linear algebra
1 project | news.ycombinator.com | 17 Aug 2022
Small Neural networks in Julia 5x faster than PyTorch
8 projects | news.ycombinator.com | 14 Apr 2022

The article asks "Which Micro-optimizations matter for BLAS3?", implying small dimensions, but doesn't actually tell me. The problem is well-studied, depending on what you consider "small". The most important thing is to avoid the packing step below an appropriate threshold. Implementations include libxsmm, blasfeo, and the "sup" version in blis (with papers on libxsmm and blasfeo). Eigen might also be relevant.
https://libxsmm.readthedocs.io/
https://blasfeo.syscop.de/
https://github.com/flame/blis
Eigen: A C++ template library for linear algebra
6 projects | news.ycombinator.com | 4 Apr 2022

RecursiveFactorization

Posts with mentions or reviews of RecursiveFactorization. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-05-01.

Can Fortran survive another 15 years?
7 projects | news.ycombinator.com | 1 May 2023

What about the other benchmarks on the same site? https://docs.sciml.ai/SciMLBenchmarksOutput/stable/Bio/BCR/ BCR takes about a hundred seconds and is pretty indicative of systems biological models, coming from 1122 ODEs with 24388 terms that describe a stiff chemical reaction network modeling the BCR signaling network from Barua et al. Or the discrete diffusion models https://docs.sciml.ai/SciMLBenchmarksOutput/stable/Jumps/Dif... which are the justification behind the claims in https://www.biorxiv.org/content/10.1101/2022.07.30.502135v1 that the O(1) scaling methods scale better than O(log n) scaling for large enough models? I mean.
> If you use special routines (BLAS/LAPACK, ...), use them everywhere as the respective community does.
It tests with and with BLAS/LAPACK (which isn't always helpful, which of course you'd see from the benchmarks if you read them). One of the key differences of course though is that there are some pure Julia tools like https://github.com/JuliaLinearAlgebra/RecursiveFactorization... which outperform the respective OpenBLAS/MKL equivalent in many scenarios, and that's one noted factor for the performance boost (and is not trivial to wrap into the interface of the other solvers, so it's not done). There are other benchmarks showing that it's not apples to apples and is instead conservative in many cases, for example https://github.com/SciML/SciPyDiffEq.jl#measuring-overhead showing the SciPyDiffEq handling with the Julia JIT optimizations gives a lower overhead than direct SciPy+Numba, so we use the lower overhead numbers in https://docs.sciml.ai/SciMLBenchmarksOutput/stable/MultiLang....
> you must compile/write whole programs in each of the respective languages to enable full compiler/interpreter optimizations
You do realize that a .so has lower overhead to call from a JIT compiled language than from a static compiled language like C because you can optimize away some of the bindings at the runtime right? https://github.com/dyu/ffi-overhead is a measurement of that, and you see LuaJIT and Julia as faster than C and Fortran here. This shouldn't be surprising because it's pretty clear how that works?
I mean yes, someone can always ask for more benchmarks, but now we have a site that's auto updating tons and tons of ODE benchmarks with ODE systems ranging from size 2 to the thousands, with as many things as we can wrap in as many scenarios as we can wrap. And we don't even "win" all of our benchmarks because unlike for you, these benchmarks aren't for winning but for tracking development (somehow for Hacker News folks they ignore the utility part and go straight to language wars...).
If you have a concrete change you think can improve the benchmarks, then please share it at https://github.com/SciML/SciMLBenchmarks.jl. We'll be happy to make and maintain another.
Yann Lecun: ML would have advanced if other lang had been adopted versus Python
9 projects | news.ycombinator.com | 22 Feb 2023
Small Neural networks in Julia 5x faster than PyTorch
8 projects | news.ycombinator.com | 14 Apr 2022

Ask them to download Julia and try it, and file an issue if it is not fast enough. We try to have the latest available.
See for example: https://github.com/JuliaLinearAlgebra/RecursiveFactorization...

What are some alternatives?

When comparing blis and RecursiveFactorization you can also consider the following projects:

tiny-cuda-nn - Lightning fast C++/CUDA neural network framework

vectorflow

diffrax - Numerical differential equation solvers in JAX. Autodifferentiable and GPU-capable. https://docs.kidger.site/diffrax/

sundials - Official development repository for SUNDIALS - a SUite of Nonlinear and DIfferential/ALgebraic equation Solvers. Pull requests are welcome for bug fixes and minor changes.

DirectXMath - DirectXMath is an all inline SIMD C++ linear algebra library for use in games and graphics apps

LeNetTorch - PyTorch implementation of LeNet for fitting MNIST for benchmarking.

xtensor - C++ tensors with broadcasting and lazy computing

KiteSimulators.jl - Simulators for kite power systems

how-to-optimize-gemm

RecursiveFactorization.jl

blis vs tiny-cuda-nn RecursiveFactorization vs tiny-cuda-nn blis vs vectorflow RecursiveFactorization vs diffrax blis vs sundials RecursiveFactorization vs vectorflow blis vs DirectXMath RecursiveFactorization vs LeNetTorch blis vs xtensor RecursiveFactorization vs KiteSimulators.jl blis vs how-to-optimize-gemm RecursiveFactorization vs RecursiveFactorization.jl

Compare blis vs RecursiveFactorization and see what are their differences.

blis

RecursiveFactorization

blis

RecursiveFactorization

What are some alternatives?