-
xz
Discontinued XZ Utils [GET https://api.github.com/repos/tukaani-project/xz: 403 - Repository access blocked]
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
You probably already know, but with OCaml 5 the only way to get flamegraphs working is to either:
* use framepointers [1]
* use LBR (but LBR has a limited depth, and may not work on on all CPUs, I'm assuming due to bugs in perf)
* implement some deep changes in how perf works to handle the 2 stacks in OCaml (I don't even know if this would be possible), or write/adapt some eBPF code to do it
OCaml 5 has a separate stack for OCaml code and C code, and although GDB can link them based on DWARF info, perf DWARF call-graphs cannot (https://github.com/ocaml/ocaml/issues/12563#issuecomment-193...)
If you need more evidence to keep it enabled in future releases, you can use OCaml 5 as an example (unfortunately there aren't many OCaml applications, so that may not carry too much weight on its own).
[1]: I haven't actually realised that Fedora39 has already enabled FP by default, nice! (I still do most of my day-to-day profiling on an ~CentOS 7 system with 'perf --call-graph dwarf', I was aware that there was a discussion to enable FP by default, but haven't noticed it has actually been done already)
-
Virgil doesn't use frame pointers. If you don't have dynamic stack allocation, the frame of a given function has a fixed size can be found with a simple (binary-search) table lookup. Virgil's technique uses an additional page-indexed range that further restricts the lookup to be a few comparisons on average (O(log(# retpoints per page)). It combines the unwind info with stackmaps for GC. It takes very little space.
The main driver is in (https://github.com/titzer/virgil/blob/master/rt/native/Nativ... the rest of the code in the directory implements the decoding of metadata.
I think frame pointers only make sense if frames are dynamically-sized (i.e. have stack allocation of data). Otherwise it seems weird to me that a dynamic mechanism is used when a static mechanism would suffice; mostly because no one agreed on an ABI for the metadata encoding, or an unwind routine.
I believe the 1-2% measurement number. That's in the same ballpark as pervasive checks for array bounds checks. It's weird that the odd debugging and profiling task gets special pleading for a 1% cost but adding a layer of security gets the finger. Very bizarre priorities.
-
JIT'ed code is sadly poorly supported, but LLVM has had great hooks for noting each method that is produced and its address. So you can build a simple mixed-mode unwinder, pretty easily, but mostly in process.
I think Intel's DNN things dump their info out to some common file that perf can read instead, but because the *kernels* themselves reuse rbp throughout oneDNN, it's totally useless.
Finally, can any JVM folks explain this claim about DWARF info from the article:
> Doesn't exist for JIT'd runtimes like the Java JVM
that just sounds surprising to me. Is it off by default or literally not available? (Google searches have mostly pointed to people wanting to include the JNI/C side of a JVM stack, like https://github.com/async-profiler/async-profiler/issues/215).
-
I think I might have confused two unrelated posts. The one that references Polar Signals is this one:
https://gitlab.com/freedesktop-sdk/freedesktop-sdk/-/issues/...
So not a perf issue there, but they don't think the workflow is suitable for whole-system profiling. Perf issues were in the context of `perf` using DWARF:
https://gitlab.com/freedesktop-sdk/freedesktop-sdk/-/issues/...
-
I remember talking to Brendan about the PreserveFramePointer patch during my first months at Netflix in 2015. As of JDK 21, unfortunately it is no longer a general purpose solution for the JVM, because it prevents a fast path being taken for stack thawing for virtual threads: https://github.com/openjdk/jdk/blob/d32ce65781c1d7815a69ceac...