array
aplette
Our great sponsors
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
array
-
Benchmarking 20 programming languages on N-queens and matrix multiplication
I should have mentioned somewhere, I disabled threading for OpenBLAS, so it is comparing one thread to one thread. Parallelism would be easy to add, but I tend to want the thread parallelism outside code like this anyways.
As for the inner loop not being well optimized... the disassembly looks like the same basic thing as OpenBLAS. There's disassembly in the comments of that file to show what code it generates, I'd love to know what you think is lacking! The only difference between the one I linked and this is prefetching and outer loop ordering: https://github.com/dsharlet/array/blob/master/examples/linea...
This gets to 90% of BLAS: https://github.com/dsharlet/array/blob/38f8ce332fc4e26af0832...
But this is quite general. I’m claiming you can beat BLAS if you have some unique knowledge of the problem that you can exploit. For example, some kinds of sparsity can be implemented within the above example code yet still far outperform the more general sparsity supported by MKL and similar.
-
A basic introduction to NumPy's einsum
Compilers can be pretty good if you help them out a bit. Here's my implementation of Einstein reductions (including summations) in C++, which generate pretty close to ideal code until you start getting into processor architecture specific optimizations: https://github.com/dsharlet/array#einstein-reductions
If you are looking for something like this in C++, here's my attempt at implementing it: https://github.com/dsharlet/array#einstein-reductions
It doesn't do any automatic optimization of the loops like some of the projects linked in this thread, but, it provides all the tools needed for humans to express the code in a way that a good compiler can turn it into really good code.
aplette
- Boehm-Demers-Weiser Garbage Collector
-
Try APL
There is Aplette which supposedly integrates nicely with other Unix tools. It's a port/update of the earlier openAPL source code, which I think was done by Ken Thompson? Here:
What are some alternatives?
ngn-apl - An APL interpreter written in JavaScript. Runs in a browser or NodeJS.
optimizing-the-memory-layout-of-std-tuple - Optimizing the memory layout of std::tuple
json - A tiny JSON parser and emitter for Perl 6 on Rakudo
ride - Remote IDE for Dyalog APL
NumPy - The fundamental package for scientific computing with Python.
array - Simple array language written in kotlin
APL.jl
alphafold2 - To eventually become an unofficial Pytorch implementation / replication of Alphafold2, as details of the architecture get released
cadabra2 - A field-theory motivated approach to computer algebra.
nottinygc - Higher-performance allocator for TinyGo WASI apps
julia - The Julia Programming Language
sgcl - A real-time Garbage Collector for C++