samply
perfmon | samply | |
---|---|---|
1 | 8 | |
180 | 1,818 | |
7.8% | - | |
9.2 | 9.4 | |
6 days ago | 5 days ago | |
Python | Rust | |
BSD 3-clause "New" or "Revised" License | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
perfmon
-
Frame pointers vs. DWARF – my verdict
Agner speaks about memory renaming back on Zen 2:
https://www.agner.org/forum/viewtopic.php?t=41
Intel Alderlake has performance events for tracking it:
https://github.com/intel/perfmon/blob/974c69919b2a9dfd8278cf...
But even before this you had store to load forwarding on x86. I'm not saying you have, but before inventing a performance problem it is worth spending time trying to diagnose it with thorough profiling (e.g. [1]). The Fedora frame pointer patch did a thorough performance analysis and performance will be revisited again. Unfortunately there are a lot of arm chair performance experts who haven't spent time looking into the details.
[1] https://perf.wiki.kernel.org/index.php/Top-Down_Analysis
samply
- Samply: Command-line sampling profiler for macOS and Linux
- samply: Command line CPU profiler which uses the Firefox profiler as its UI
-
Help with Rust Program performance
Regarding profilers, I really like samply. It doesn't require to modify source code, runs on Linux and macOS and automatically loads profiling data into Firefox Profiler UI.
-
AI learns to play flappy bird (code in comments)
I grabbed a quick profile using samply and noticed two things: Even in fast mode, the simulation only updates when the screen is redrawn, so its update frequency is limited by the refresh rate. And the simulation seems to mostly be bottle-necked by Vec reallocation, so re-using Vecs might help.
-
Firefox Profiler
I ran across this when I found samply [0], a CLI sampling profiler. On samply's GitHub there's a link to a sample profile that opens in the Firefox Profiler and I was in awe at just how fast it is! Try dragging your mouse over the timeline for a second: https://share.firefox.dev/3j3PJoK
0: https://github.com/mstange/samply
-
Frame pointers vs. DWARF – my verdict
IMHO, perf's decision to write whole stacks directly to the disk and unwinding them as a post-process is a really bad design. It wastes disk space, and as the author pointed out, it also has a lot of IO overhead.
As an alternative approach, https://github.com/mstange/samply processes data streamed from perf and unwinds it in realtime. The unwinding overhead is surprisingly low: it only takes around 1% of (single) CPU per CPU profiled. Solving the disk waste alone has been a tremendous improvement of profiling experience. As a bonus, the unwinding and symbolization works reliably while I frequently had postprocessing not terminating when using the perf CLI directly.
-
Data-driven performance optimization with Rust and Miri
samply supports showing inline frames in call stacks. I find this makes a huge difference when profiling Rust.
- Samply: A work in progress of a command-line profiler for macOS and Linux
What are some alternatives?
parca-agent - eBPF based always-on profiler auto-discovering targets in Kubernetes and systemd, zero code changes or restarts needed!
pprof-rs - A Rust CPU profiler implemented with the help of backtrace-rs
rust-flappy-bird-ai - AI learns to play flappy bird using neuro-evolution, implemented in Rust using macroquad
flamegraph - Easy flamegraphs for Rust projects and everything else, without Perl or pipes <3
profiler - Firefox Profiler — Web app for Firefox performance analysis
rayon - Rayon: A data parallelism library for Rust
linux - Linux kernel source tree