blis
SciPy
blis | SciPy | |
---|---|---|
17 | 50 | |
2,091 | 12,459 | |
3.5% | 1.0% | |
7.0 | 9.9 | |
7 days ago | 4 days ago | |
C | Python | |
GNU General Public License v3.0 or later | BSD 3-clause "New" or "Revised" License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
blis
-
Faer-rs: Linear algebra foundation for the Rust programming language
BLIS is an interesting new direction in that regard: https://github.com/flame/blis
>The BLAS-like Library Instantiation Software (BLIS) framework is a new infrastructure for rapidly instantiating Basic Linear Algebra Subprograms (BLAS) functionality. Its fundamental innovation is that virtually all computation within level-2 (matrix-vector) and level-3 (matrix-matrix) BLAS operations can be expressed and optimized in terms of very simple kernels.
-
Optimize sgemm on RISC-V platform
There is a recent update to the blis alternative to BLAS that includes a number of RISC-V performance optimizations.
https://github.com/flame/blis/pull/737
-
BLIS: Portable basis for high-performance BLAS-like linear algebra libs
https://github.com/flame/blis/blob/master/docs/Performance.m...
It seems that the selling point is that BLIS does multi-core quite well. I am especially impressed that it does as well as the highly optimized Intel's MKL on Intel's CPUs.
I do not see the selling point of BLIS-specific APIs, though. The whole point of having an open BLAS API standard is that numerical libraries should be drop-in replaceable, so when a new library (such as BLIS here) comes along, one could just re-link the library and reap the performance gain immediately.
What is interesting is that numerical algebra work, by nature, is mostly embarrassingly parallel, so it should not be too difficult to write multi-core implementations. And yet, BLIS here performs so much better than some other industry-leading implementations on multi-core configurations. So the question is not why BLIS does so well; the question is why some other implementations do so poorly.
-
Benchmarking 20 programming languages on N-queens and matrix multiplication
First we can use Laser, which was my initial BLAS experiment in 2019. At the time in particular, OpenBLAS didn't properly use the AVX512 VPUs. (See thread in BLIS https://github.com/flame/blis/issues/352 ), It has made progress since then, still, on my current laptop perf is in the same range
Reproduction:
-
The Art of High Performance Computing
https://github.com/flame/blis/
Field et al, recent winners of the James H. Wilkinson Prize for Numerical Software.
Field and Goto both worked with Robert van de Geijn. Lots of TACC interaction in that broader team.
-
[D] Which BLAS library to choose for apple silicon?
BLIS is fine too~ https://github.com/flame/blis
-
Column Vectors vs. Row Vectors
Here's BLIS's object API:
https://github.com/flame/blis/blob/master/docs/BLISObjectAPI...
Searching "object" in BLIS's README (https://github.com/flame/blis) to see what they think of it:
"Objects are relatively lightweight structs and passed by address, which helps tame function calling overhead."
"This is API abstracts away properties of vectors and matrices within obj_t structs that can be queried with accessor functions. Many developers and experts prefer this API over the typed API."
In my opinion, this API is a strict improvement over BLAS. I do not think there is any reason to prefer the old BLAS-style API over an improvement like this.
Regarding your own experience, it's great that using BLAS proved to be a valuable learning experience for you. But your argument that the BLAS API is somehow uniquely helpful in terms of learning how to program numerical algorithms efficiently, or that it will help you avoid performance problems, is not true. It is possible to replace the BLAS API with a more modern and intuitive API with the same benefits. To be clear, the benefits here are direct memory management and control of striding and matrix layout, which create opportunities for optimization. There is nothing unique about BLAS in this regard---it's possible to learn these lessons using any of the other listed options if you're paying attention and being systematic.
- BLIS: Portable software framework for high-performance linear algebra
-
Small Neural networks in Julia 5x faster than PyTorch
The article asks "Which Micro-optimizations matter for BLAS3?", implying small dimensions, but doesn't actually tell me. The problem is well-studied, depending on what you consider "small". The most important thing is to avoid the packing step below an appropriate threshold. Implementations include libxsmm, blasfeo, and the "sup" version in blis (with papers on libxsmm and blasfeo). Eigen might also be relevant.
https://libxsmm.readthedocs.io/
https://blasfeo.syscop.de/
https://github.com/flame/blis
- Eigen: A C++ template library for linear algebra
SciPy
-
What Is a Schur Decomposition?
I guess it is a rite of passage to rewrite it. I'm doing it for SciPy too together with Propack in [1]. Somebody already mentioned your repo. Thank you for your efforts.
[1]: https://github.com/scipy/scipy/issues/18566
-
Fortran codes are causing problems
Fortran codes have caused many problems for the Python package Scipy, and some of them are now being rewritten in C: e.g., https://github.com/scipy/scipy/pull/19121. Not only does R have many Fortran codes, there are also many R packages using Fortran codes: https://github.com/r-devel/r-svn, https://github.com/cran?q=&type=&language=fortran&sort=. Modern Fortran is a fine language but most legacy Fortran codes use the F77 style. When I update the R package quantreg, which uses many Fortran codes, I get a lot of warning messages. Not sure how the Fortran codes in the R ecosystem will be dealt with in the future, but they recently caused an issue in R due to the lack of compiler support for Fortran: https://blog.r-project.org/2023/08/23/will-r-work-on-64-bit-arm-windows/index.html. Some renowned packages like glmnet already have their Fortran codes rewritten in C/C++: https://cran.r-project.org/web/packages/glmnet/news/news.html
-
[D] Which BLAS library to choose for apple silicon?
There are several lessons here: a) vanilla conda-forge numpy and scipy versions come with openblas, and it works pretty well, b) do not use netlib unless your matrices are small and you need to do a lot of SVDs, or idek why c) Apple's veclib/accelerate is super fast, but it is also numerically unstable. So much so that the scipy's devs dropped any support of it back in 2018. Like dang. That said, they are apparently are bring it back in, since the 13.3 release of macOS Ventura saw some major improvements in accelerate performance.
-
SciPy: Interested in adopting PRIMA, but little appetite for more Fortran code
First, if you read through that scipy issue (https://github.com/scipy/scipy/issues/18118 ) the author was willing and able to relicense PRIMA under a 3-clause BSD license which is perfectly acceptable for scipy.
For the numerical recipes reference, there is a mention that scipy uses a slightly improved version of Powell's algorithm that is originally due to Forman Acton and presumably published in his popular book on numerical analysis, and that also happens to be described & included in numerical recipes. That is, unless the code scipy uses is copied from numerical recipes, which I presume it isn't, NR having the same algorithm doesn't mean that every other independent implementation of that algorithm falls under NR copyright.
- numerically evaluating wavelets?
- Fortran in SciPy: Get rid of linalg.interpolative Fortran code
-
Optimization Without Using Derivatives
Reading the discussions under a previous thread titled "More Descent, Less Gradient"( https://news.ycombinator.com/item?id=23004026 ), I guess people might be interested in PRIMA ( www.libprima.net ), which provides the reference implementation for Powell's renowned gradient/derivative-free (zeroth-order) optimization methods, namely COBYLA, UOBYQA, NEWUOA, BOBYQA, and LINCOA.
PRIMA solves general nonlinear optimizaton problems without using derivatives. It implements Powell's solvers in modern Fortran, compling with the Fortran 2008 standard. The implementation is faithful, in the sense of being mathmatically equivalent to Powell's Fortran 77 implementation, but with a better numerical performance. In contrast to the 7939 lines of Fortran 77 code with 244 GOTOs, the new implementation is structured and modularized.
There is a discussion to include the PRIMA solvers into SciPy ( https://github.com/scipy/scipy/issues/18118 ), replacing the buggy and unmaintained Fortran 77 version of COBYLA, and making the other four solvers available to all SciPy users.
- What can I contribute to SciPy (or other) with my pure math skill? I’m pen and paper mathematician
-
Emerging Technologies: Rust in HPC
if that makes your eyes bleed, what do you think about this? https://github.com/scipy/scipy/blob/main/scipy/special/specfun/specfun.f (heh)
- Python
What are some alternatives?
tiny-cuda-nn - Lightning fast C++/CUDA neural network framework
SymPy - A computer algebra system written in pure Python
vectorflow
statsmodels - Statsmodels: statistical modeling and econometrics in Python
sundials - Official development repository for SUNDIALS - a SUite of Nonlinear and DIfferential/ALgebraic equation Solvers. Pull requests are welcome for bug fixes and minor changes.
NumPy - The fundamental package for scientific computing with Python.
DirectXMath - DirectXMath is an all inline SIMD C++ linear algebra library for use in games and graphics apps
Pandas - Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
xtensor - C++ tensors with broadcasting and lazy computing
astropy - Astronomy and astrophysics core library
how-to-optimize-gemm
or-tools - Google's Operations Research tools: