The Return of the Frame Pointers

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
  • xz

    Discontinued XZ Utils [GET https://api.github.com/repos/tukaani-project/xz: 403 - Repository access blocked]

  • ocaml

    The core OCaml system: compilers, runtime system, base libraries

  • You probably already know, but with OCaml 5 the only way to get flamegraphs working is to either:

    * use framepointers [1]

    * use LBR (but LBR has a limited depth, and may not work on on all CPUs, I'm assuming due to bugs in perf)

    * implement some deep changes in how perf works to handle the 2 stacks in OCaml (I don't even know if this would be possible), or write/adapt some eBPF code to do it

    OCaml 5 has a separate stack for OCaml code and C code, and although GDB can link them based on DWARF info, perf DWARF call-graphs cannot (https://github.com/ocaml/ocaml/issues/12563#issuecomment-193...)

    If you need more evidence to keep it enabled in future releases, you can use OCaml 5 as an example (unfortunately there aren't many OCaml applications, so that may not carry too much weight on its own).

    [1]: I haven't actually realised that Fedora39 has already enabled FP by default, nice! (I still do most of my day-to-day profiling on an ~CentOS 7 system with 'perf --call-graph dwarf', I was aware that there was a discussion to enable FP by default, but haven't noticed it has actually been done already)

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • virgil

    A fast and lightweight native programming language

  • Virgil doesn't use frame pointers. If you don't have dynamic stack allocation, the frame of a given function has a fixed size can be found with a simple (binary-search) table lookup. Virgil's technique uses an additional page-indexed range that further restricts the lookup to be a few comparisons on average (O(log(# retpoints per page)). It combines the unwind info with stackmaps for GC. It takes very little space.

    The main driver is in (https://github.com/titzer/virgil/blob/master/rt/native/Nativ... the rest of the code in the directory implements the decoding of metadata.

    I think frame pointers only make sense if frames are dynamically-sized (i.e. have stack allocation of data). Otherwise it seems weird to me that a dynamic mechanism is used when a static mechanism would suffice; mostly because no one agreed on an ABI for the metadata encoding, or an unwind routine.

    I believe the 1-2% measurement number. That's in the same ballpark as pervasive checks for array bounds checks. It's weird that the odd debugging and profiling task gets special pleading for a 1% cost but adding a layer of security gets the finger. Very bizarre priorities.

  • async-profiler

    Sampling CPU and HEAP profiler for Java featuring AsyncGetCallTrace + perf_events

  • JIT'ed code is sadly poorly supported, but LLVM has had great hooks for noting each method that is produced and its address. So you can build a simple mixed-mode unwinder, pretty easily, but mostly in process.

    I think Intel's DNN things dump their info out to some common file that perf can read instead, but because the *kernels* themselves reuse rbp throughout oneDNN, it's totally useless.

    Finally, can any JVM folks explain this claim about DWARF info from the article:

    > Doesn't exist for JIT'd runtimes like the Java JVM

    that just sounds surprising to me. Is it off by default or literally not available? (Google searches have mostly pointed to people wanting to include the JNI/C side of a JVM stack, like https://github.com/async-profiler/async-profiler/issues/215).

  • I think I might have confused two unrelated posts. The one that references Polar Signals is this one:

    https://gitlab.com/freedesktop-sdk/freedesktop-sdk/-/issues/...

    So not a perf issue there, but they don't think the workflow is suitable for whole-system profiling. Perf issues were in the context of `perf` using DWARF:

    https://gitlab.com/freedesktop-sdk/freedesktop-sdk/-/issues/...

  • JDK

    JDK main-line development https://openjdk.org/projects/jdk

  • I remember talking to Brendan about the PreserveFramePointer patch during my first months at Netflix in 2015. As of JDK 21, unfortunately it is no longer a general purpose solution for the JVM, because it prevents a fast path being taken for stack thawing for virtual threads: https://github.com/openjdk/jdk/blob/d32ce65781c1d7815a69ceac...

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts