mmperf
flops
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
mmperf
-
PyTorch on Apple M1 Faster Than TensorFlow-Metal
Here are the matmul sizes for the MiniLM model used for inference: https://github.com/mmperf/mmperf/blob/main/benchmark_sizes/b...
These are the matmul sizes for the BERT training workload https://github.com/mmperf/mmperf/blob/main/benchmark_sizes/b...
Yes we use the latest MoltenVK (1.3.204.0) installed in the system.
I will let @noxa and other IREE devs chime in on the SPIR-V path but we do support prefix sums etc in the GPU path.
//part of nod.ai team.
-
M1 Pro First Impressions: Core Management and CPU Performance
Could you give me a benchmark in particular? Or maybe this one works: https://github.com/mmperf/mmperf. I'll run it in an hour.
flops
-
Moore's Law, AI, and the pace of progress
There are no CPU Gflops/W benchmarks on M1 but my freinds who have M1 laptops used this code to test it: https://github.com/brianolson/flops
The M1 is only good at highly specialized tasks that they have custom designed hard/firm/soft-ware for; which are all locked both legally and technically into a tomb where it will remain until the end of humanity.
If you spend one second even thinking about, them you are wasting time for eternity!
-
M1 Pro First Impressions: Core Management and CPU Performance
So a friend that has a M1 did this test: https://github.com/brianolson/flops/blob/master/flops.c
And the 5nm M1 has ~2.5Gflops/W which is not a huge increase compared to the 28nm Pi 4 at 2Gflops/W.
No-moores law in effect. Game Over!
What are some alternatives?
Flops - How many FLOPS can you achieve?
aws-graviton-getting-started - Helping developers to use AWS Graviton2 and Graviton3 processors which power the 6th and 7th generation of Amazon EC2 instances (C6g[d], M6g[d], R6g[d], T4g, X2gd, C6gn, I4g, Im4gn, Is4gen, G5g, C7g[d][n], M7g[d], R7g[d]).
shark-samples
rust-crc32fast - Fast, SIMD-accelerated CRC32 (IEEE) checksum computation in Rust
cutlass - CUDA Templates for Linear Algebra Subroutines
performance_results - performance results/benchmarks for a variety of machines
iree - A retargetable MLIR-based machine learning compiler and runtime toolkit.
xcode-hardware-performance - Results from running Xcode on a non-trivial open source project using various Macs
PurefunctionPipelineDataflow - My Blog: The Math-based Grand Unified Programming Theory: The Pure Function Pipeline Data Flow with principle-based Warehouse/Workshop Model
Amethyst - Automatic tiling window manager for macOS à la xmonad.