plb2
weave
plb2 | weave | |
---|---|---|
7 | 7 | |
238 | 527 | |
- | - | |
9.4 | 3.0 | |
20 days ago | 5 months ago | |
C | Nim | |
Creative Commons Zero v1.0 Universal | GNU General Public License v3.0 or later |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
plb2
-
Byte-Sized Swift: Building Tiny Games for the Playdate
https://github.com/attractivechaos/plb2 - limited but broad comparison across a large number of languages. Swift and Nim both compare favourably to C.
-
The One Billion Row Challenge in Go: from 1m45s to 4s in nine solutions
https://github.com/attractivechaos/plb2/blob/master/README.m...
Synthetic benchmarks aside, I think as far as average (spring boots of the world) code goes, Go beats Java almost every time, often in less lines than the usual pom.xml
-
Python 3.13 Gets a JIT
I wouldn't be so enthusiastic. Look at other languages that have JIT now: Ruby and PHP. After years of efforts, they are still an order of magnitude slower than V8 and even PyPy [1]. It seems to me that you need to design a JIT implementation from ground up to get good performance – V8, Dart, LuaJIT and PyPy are like this; if you start with a pure interpreter, it may be difficult to speed it up later.
[1] https://github.com/attractivechaos/plb2
-
Benchmarking 20 programming languages on N-queens and matrix multiplication
A curious thing about Swift: after https://github.com/attractivechaos/plb2/pull/23, the matrix multiplication example is comparable to C and Rust. However, I don’t see a way to idiomatically optimise the sudoku example, whose main overhead is allocating several arrays each time solve() is called. Apparently, in Swift there is no such thing as static array allocation. That’s very unfortunate.
weave
- The GIL can now be disabled in Python's main branch
-
Maybe Everything Is a Coroutine
GPU drivers provide an event system:
- Cuda: https://github.com/mratsim/weave/issues/133
-
Benchmarking 20 programming languages on N-queens and matrix multiplication
```
Note: the Theoretical peak limit is hardcoded and used my previous machine i9-9980XE.
It maybe that your BLAS library is not named libopenblas.so, you can change that here: https://github.com/mratsim/laser/blob/master/benchmarks/thir...
Implementation is in this folder: https://github.com/mratsim/laser/tree/master/laser/primitive...
in particular, tiling, cache and register optimization: https://github.com/mratsim/laser/blob/master/laser/primitive...
AVX512 code generator: https://github.com/mratsim/laser/blob/master/laser/primitive...
And generic Scalar/SSE/AVX/AVX2/AVX512 microkernel generator (this is Nim macros to generate code at compile-time): https://github.com/mratsim/laser/blob/master/laser/primitive...
I'll come back later with details on how to use my custom HPC threadpool Weave instead of OpenMP (https://github.com/mratsim/weave/tree/master/benchmarks/matm...)
-
Nim vs Rust Benchmarks
In my benchmarks, Nim is faster than Rust:
- multithreading runtime (i.e Rayon vs Weave https://github.com/mratsim/weave)
- Cryptography: https://hackmd.io/@gnark/eccbench#Pairing
- Scientific computing / matrix multiplication: https://github.com/bluss/matrixmultiply/issues/34#issuecomme...
There is no inherent reason why a Nim program would be slower than Rust.
-
Aren't green threads just better than async/await?
If you're interested into diving into this I have reviewed solutions to cactus stacks / split stacks here https://github.com/mratsim/weave/blob/master/weave/memory/multithreaded_memory_management.md
-
Nim 2.0 – Thoughts
[4] https://github.com/mratsim/weave
What are some alternatives?
c-examples - Example C code
eioio - Effects-based direct-style IO for multicore OCaml
laser - The HPC toolbox: fused matrix multiplication, convolution, data-parallel strided tensor primitives, OpenMP facilities, SIMD, JIT Assembler, CPU detection, state-of-the-art vectorized BLAS for floats and integers
httpbeast - A highly performant, multi-threaded HTTP 1.1 server written in Nim.
tarantool - Get your data in RAM. Get compute close to data. Enjoy the performance.
matrixmultiply - General matrix multiplication of f32 and f64 matrices in Rust. Supports matrices with general strides.
blis - BLAS-like Library Instantiation Software Framework
Edith - Electronic Design in Swithft
related_post_gen - Data Processing benchmark featuring Rust, Go, Swift, Zig, Julia etc.
ocaml-multicore - Multicore OCaml
1brc - 1BRC in .NET among fastest on Linux
cosmopolitan - build-once run-anywhere c library