blis
neanderthal
Our great sponsors
blis | neanderthal | |
---|---|---|
16 | 5 | |
2,007 | 1,042 | |
3.9% | 0.3% | |
7.1 | 7.0 | |
8 days ago | 11 days ago | |
C | Clojure | |
GNU General Public License v3.0 or later | Eclipse Public License 1.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
blis
-
Optimize sgemm on RISC-V platform
There is a recent update to the blis alternative to BLAS that includes a number of RISC-V performance optimizations.
-
BLIS: Portable basis for high-performance BLAS-like linear algebra libs
https://github.com/flame/blis/blob/master/docs/Performance.m...
It seems that the selling point is that BLIS does multi-core quite well. I am especially impressed that it does as well as the highly optimized Intel's MKL on Intel's CPUs.
I do not see the selling point of BLIS-specific APIs, though. The whole point of having an open BLAS API standard is that numerical libraries should be drop-in replaceable, so when a new library (such as BLIS here) comes along, one could just re-link the library and reap the performance gain immediately.
What is interesting is that numerical algebra work, by nature, is mostly embarrassingly parallel, so it should not be too difficult to write multi-core implementations. And yet, BLIS here performs so much better than some other industry-leading implementations on multi-core configurations. So the question is not why BLIS does so well; the question is why some other implementations do so poorly.
-
Benchmarking 20 programming languages on N-queens and matrix multiplication
First we can use Laser, which was my initial BLAS experiment in 2019. At the time in particular, OpenBLAS didn't properly use the AVX512 VPUs. (See thread in BLIS https://github.com/flame/blis/issues/352 ), It has made progress since then, still, on my current laptop perf is in the same range
Reproduction:
-
The Art of High Performance Computing
https://github.com/flame/blis/
Field et al, recent winners of the James H. Wilkinson Prize for Numerical Software.
Field and Goto both worked with Robert van de Geijn. Lots of TACC interaction in that broader team.
-
[D] Which BLAS library to choose for apple silicon?
BLIS is fine too~ https://github.com/flame/blis
-
Small Neural networks in Julia 5x faster than PyTorch
The article asks "Which Micro-optimizations matter for BLAS3?", implying small dimensions, but doesn't actually tell me. The problem is well-studied, depending on what you consider "small". The most important thing is to avoid the packing step below an appropriate threshold. Implementations include libxsmm, blasfeo, and the "sup" version in blis (with papers on libxsmm and blasfeo). Eigen might also be relevant.
- Eigen: A C++ template library for linear algebra
-
Matrix Multiplication Inches Closer To Mythic Goal
However, on recent CPUs 4x4 is small for the innermost block size of the non-trivial hierarchy you need. You can see examples under https://github.com/flame/blis/tree/master/config with an a priori procedure for determining them in https://www.cs.utexas.edu/users/flame/pubs/TOMS-BLIS-Analyti... (but compare with what's actually used for SKX, in particular). OpenBLAS will normally be similar, though it may come out somewhat faster, but it's easier to see in BLIS.
neanderthal
- AI’s compute fragmentation: what matrix multiplication teaches us
-
Having trouble setting up Neanderthal.
There is the official Hello World https://github.com/uncomplicate/neanderthal/tree/master/examples/hello-world
- Da li u Srbiji , generalno prostoru balkana , ima "Ozbiljnih" Open source kreatora?
-
Anybody using Common Lisp or clojure for data science
Did you have any occasion to evaluate neanderthal during your research? People seem to prefer it over core.matrix because it focus on primitive speed and sticking to BLAS idioms (as well as offering a decent api for working with GPU backends via cuda and opencl). I am curious to see if you did and found anything lacking there. I have a project on the backburner to try and target neanderthal for local search stuff, expressing problems in a high-level API that can then be baked into some numerically-friendly representation for efficient execution. It's often easier (trivial) to express solution representations, neighborhood functions, and objectives/constraints in a general purpose language, of which none of the things we like (sparse data structures, dynamically allocated stuff) are amenable to the contiguous memory, primitive numeric model that the hardware wants.
-
I want to quit my data analyst job and learn and become a Clojure developer
Do clojure as a side gig or in free time. Let day job pay the bills. If you can, maybe incorporate clojure into work job to solve small problems (https://github.com/clj-python/libpython-clj and https://github.com/scicloj/clojisr provide bridges to/from python and r). There is a lot of effort going into the data science side as well; the scicloj effort has resulted in a lot of growth over the last 2 years. tech.ml.dataset, tech.ml (now scicloj.ml). Dragan has a bunch of excellent stuff in neanderthal and deep diamond. There are also bindings to other jvm libraries from multiple languages.
What are some alternatives?
tiny-cuda-nn - Lightning fast C++/CUDA neural network framework
dtype-next - A Clojure library designed to aid in the implementation of high performance algorithms and systems.
libpython-clj - Python bindings for Clojure
deep-diamond - A fast Clojure Tensor & Deep Learning library
numcl-benchmarks - benchmarks against numpy, julia
magicl - Matrix Algebra proGrams In Common Lisp.
vectorflow
sundials - Official development repository for SUNDIALS - a SUite of Nonlinear and DIfferential/ALgebraic equation Solvers. Pull requests are welcome for bug fixes and minor changes.
qvm - The high-performance and featureful Quil simulator.
how-to-optimize-gemm
xtensor - C++ tensors with broadcasting and lazy computing
rebel-readline - Terminal readline library for Clojure dialects