shark-samples
mmperf
shark-samples | mmperf | |
---|---|---|
1 | 2 | |
15 | 121 | |
- | 3.3% | |
0.0 | 4.3 | |
about 2 years ago | 7 months ago | |
Python | C++ | |
Apache License 2.0 | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
shark-samples
-
PyTorch on Apple M1 Faster Than TensorFlow-Metal
I updated the blog with the reference. Basically it crashes to compile the model with https://github.com/NodLabs/shark-samples/blob/main/examples/.... The coremltools converter is very version specific (like all vendor conversion kits) and still on a version of TF I couldn't get on conda. Also it doesn't allow for training and only FP16 for inference with ANE. All our tests were with FP32.
//part of nod.ai/shark team.
mmperf
-
PyTorch on Apple M1 Faster Than TensorFlow-Metal
Here are the matmul sizes for the MiniLM model used for inference: https://github.com/mmperf/mmperf/blob/main/benchmark_sizes/b...
These are the matmul sizes for the BERT training workload https://github.com/mmperf/mmperf/blob/main/benchmark_sizes/b...
Yes we use the latest MoltenVK (1.3.204.0) installed in the system.
I will let @noxa and other IREE devs chime in on the SPIR-V path but we do support prefix sums etc in the GPU path.
//part of nod.ai team.
-
M1 Pro First Impressions: Core Management and CPU Performance
Could you give me a benchmark in particular? Or maybe this one works: https://github.com/mmperf/mmperf. I'll run it in an hour.
What are some alternatives?
iree - A retargetable MLIR-based machine learning compiler and runtime toolkit.
Flops - How many FLOPS can you achieve?
cutlass - CUDA Templates for Linear Algebra Subroutines
flops - Tiny cpu benchmark
performance_results - performance results/benchmarks for a variety of machines
rust-crc32fast - Fast, SIMD-accelerated CRC32 (IEEE) checksum computation in Rust
Rectangle - Move and resize windows on macOS with keyboard shortcuts and snap areas
xcode-hardware-performance - Results from running Xcode on a non-trivial open source project using various Macs
aws-graviton-getting-started - Helping developers to use AWS Graviton2 and Graviton3 processors which power the 6th and 7th generation of Amazon EC2 instances (C6g[d], M6g[d], R6g[d], T4g, X2gd, C6gn, I4g, Im4gn, Is4gen, G5g, C7g[d][n], M7g[d], R7g[d]).