c-examples
weave
c-examples | weave | |
---|---|---|
4 | 7 | |
4 | 524 | |
- | - | |
9.1 | 3.0 | |
22 days ago | 5 months ago | |
C | Nim | |
GNU General Public License v3.0 or later | GNU General Public License v3.0 or later |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
c-examples
-
Benchmarking 20 programming languages on N-queens and matrix multiplication
So I actually tested your code: https://gist.github.com/bjourne/c2d0db48b2e50aaadf884e4450c6...
On my machine single-threaded OpenBLAS multiplies two single precision 4096x4096 matrices in 0.95 seconds. Your code takes over 30 seconds. For comparison, my own matrix multiplication code (https://github.com/bjourne/c-examples/blob/master/libraries/...) run in single-threaded mode takes 0.89 seconds. Which actually beats OpenBLAS, but OpenBLAS retakes the lead for larger arrays when multi-threading is added.
- Julia and Mojo (Modular) Mandelbrot Benchmark
- Reference Count, Don't Garbage Collect
weave
- The GIL can now be disabled in Python's main branch
-
Maybe Everything Is a Coroutine
GPU drivers provide an event system:
- Cuda: https://github.com/mratsim/weave/issues/133
-
Benchmarking 20 programming languages on N-queens and matrix multiplication
```
Note: the Theoretical peak limit is hardcoded and used my previous machine i9-9980XE.
It maybe that your BLAS library is not named libopenblas.so, you can change that here: https://github.com/mratsim/laser/blob/master/benchmarks/thir...
Implementation is in this folder: https://github.com/mratsim/laser/tree/master/laser/primitive...
in particular, tiling, cache and register optimization: https://github.com/mratsim/laser/blob/master/laser/primitive...
AVX512 code generator: https://github.com/mratsim/laser/blob/master/laser/primitive...
And generic Scalar/SSE/AVX/AVX2/AVX512 microkernel generator (this is Nim macros to generate code at compile-time): https://github.com/mratsim/laser/blob/master/laser/primitive...
I'll come back later with details on how to use my custom HPC threadpool Weave instead of OpenMP (https://github.com/mratsim/weave/tree/master/benchmarks/matm...)
-
Nim vs Rust Benchmarks
In my benchmarks, Nim is faster than Rust:
- multithreading runtime (i.e Rayon vs Weave https://github.com/mratsim/weave)
- Cryptography: https://hackmd.io/@gnark/eccbench#Pairing
- Scientific computing / matrix multiplication: https://github.com/bluss/matrixmultiply/issues/34#issuecomme...
There is no inherent reason why a Nim program would be slower than Rust.
-
Aren't green threads just better than async/await?
If you're interested into diving into this I have reviewed solutions to cactus stacks / split stacks here https://github.com/mratsim/weave/blob/master/weave/memory/multithreaded_memory_management.md
-
Nim 2.0 – Thoughts
[4] https://github.com/mratsim/weave
What are some alternatives?
ixy-languages - A high-speed network driver written in C, Rust, C++, Go, C#, Java, OCaml, Haskell, Swift, Javascript, and Python
eioio - Effects-based direct-style IO for multicore OCaml
mark-sweep - A simple mark-sweep garbage collector in C
httpbeast - A highly performant, multi-threaded HTTP 1.1 server written in Nim.
.NET Runtime - .NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
matrixmultiply - General matrix multiplication of f32 and f64 matrices in Rust. Supports matrices with general strides.
racket - The Racket repository
Edith - Electronic Design in Swithft
Mesh - A memory allocator that automatically reduces the memory footprint of C/C++ applications.
ocaml-multicore - Multicore OCaml
plb2 - A programming language benchmark
cosmopolitan - build-once run-anywhere c library