Frame pointers vs. DWARF – my verdict

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
  • samply

    Command-line sampling profiler for macOS and Linux

  • IMHO, perf's decision to write whole stacks directly to the disk and unwinding them as a post-process is a really bad design. It wastes disk space, and as the author pointed out, it also has a lot of IO overhead.

    As an alternative approach, https://github.com/mstange/samply processes data streamed from perf and unwinds it in realtime. The unwinding overhead is surprisingly low: it only takes around 1% of (single) CPU per CPU profiled. Solving the disk waste alone has been a tremendous improvement of profiling experience. As a bonus, the unwinding and symbolization works reliably while I frequently had postprocessing not terminating when using the perf CLI directly.

  • linux

    Linux kernel source tree

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • perfmon

  • Agner speaks about memory renaming back on Zen 2:

    https://www.agner.org/forum/viewtopic.php?t=41

    Intel Alderlake has performance events for tracking it:

    https://github.com/intel/perfmon/blob/974c69919b2a9dfd8278cf...

    But even before this you had store to load forwarding on x86. I'm not saying you have, but before inventing a performance problem it is worth spending time trying to diagnose it with thorough profiling (e.g. [1]). The Fedora frame pointer patch did a thorough performance analysis and performance will be revisited again. Unfortunately there are a lot of arm chair performance experts who haven't spent time looking into the details.

    [1] https://perf.wiki.kernel.org/index.php/Top-Down_Analysis

  • parca-agent

    eBPF based always-on profiler auto-discovering targets in Kubernetes and systemd, zero code changes or restarts needed!

  • The pervasive lack of frame pointers is the reason why we've developed a custom format derived from DWARF unwind information thanks to some insights: DWARF unwind information is incredible flexible, it supports many arches and allows restoring any arbitrary register. But we only need 3: the frame pointer, the stack pointer, and in non-x86 the return address.

    In addition, this encoding doesn't use that many bytes, but unfortunately reading and parsing that information is quite expensive.

    For that reason I've developed a new unwinder that uses custom unwind information derived from DWARF (https://www.polarsignals.com/blog/posts/2022/11/29/profiling..., previously discussed in https://news.ycombinator.com/item?id=33788794) that runs in BPF. This new compact representation can be binary searched easily and each unwind row has a size of 16 bytes. I are currently working on reducing it down to ~10 bytes.

    All the code is fully OSS (Apache 2.0 for userspace and GPL for BPF), and part of the Parca project (https://github.com/parca-dev/parca-agent).

    We've also given some talks in FOSDEM going deeper into how we made it scale for many big processes.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts