FlameGraph
Folly
FlameGraph | Folly | |
---|---|---|
53 | 90 | |
16,438 | 27,118 | |
- | 0.5% | |
4.5 | 9.8 | |
16 days ago | about 16 hours ago | |
Perl | C++ | |
- | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
FlameGraph
-
JVM Profiling in Action
We'll use async-profiler and flame graphs for profiling. To simplify the process, we'll run the code using JBang.
-
Memray – A Memory Profiler for Python
And flame graphs excel and this kind of thing
https://www.brendangregg.com/flamegraphs.html
-
All my favorite tracing tools: eBPF, QEMU, Perfetto, new ones I built and more
which can output in a format understood by Brendan Gregg's flame frames (https://www.brendangregg.com/flamegraphs.html)
But that's not quite the kind of tracing you're talking about. We also built a printf-style interface to our recording files, which seems closer:
-
Recap of Werner Vogels' Keynote at re:Invent 2023
Strategies included discontinuing or resizing underutilized services, transitioning to more cost-effective solutions, reducing the current resources to the amount of resources that we need for our application, and conducting detailed analyses of computing resource utilization through tools like flamegraphs. This detailed scrutiny helped identify and rectify significant cost-driving areas, such as garbage collection and application configurations.
-
Pinpoint performance regressions with CI-Integrated differential profiling
Flame Graphs by Brendan Gregg
-
Flameshow: A Terminal Flamegraph Viewer
Historically brendangregg's since AIUI he basically invented flamegraphs
https://www.brendangregg.com/flamegraphs.html
So if you can make your tool eat whatever https://github.com/brendangregg/FlameGraph is fed with you're going to support a lot of existing tooling across OSes and languages.
-
Introducing Flame graphs: It’s getting hot in here
“Flame graphs are a visualization of hierarchical data, created to visualize stack traces of profiled software so that the most frequent code-paths to be identified quickly and accurately.”
-
Using SVG to create simple sparkline charts
SVGs are amazing for interactive visualisation too. Like Flamegraphs: https://www.brendangregg.com/flamegraphs.html
-
Good example of using flame graphs to speed up java code (50x improvement)
This may be a good example of the application of a flame graph but it is not a good demonstration of flame graphs; the graph is nearly incidental. The source has an actual explanation.
-
Intro to PostGraphile V5 (Part 1): Replacing the Foundations
A profiling flame graph from Graphile Crystal (a precursor to Grafast) using GraphQL.js' executor (each tick is 1ms, total: 29ms). As we removed more and more responsibilities from GraphQL.js, we ended up only using it for output. Replacing this final responsibility with a custom implementation in Graphile Crystal itself, we reduced execution time for this query down to 15.5ms (effectively removing the majority of the yellow portion of the flame graph).
Folly
-
Ask HN: How bad is the xz hack?
https://github.com/facebook/folly/commit/b1391e1c57be71c1e2a...
-
Backdoor in upstream xz/liblzma leading to SSH server compromise
https://github.com/facebook/folly/pull/2153
-
A lock-free ring-buffer with contiguous reservations (2019)
To set a HP on Linux, Folly just does a relaxed load of the src pointer, release store of the HP, compiler-only barrier, and acquire load. (This prevents the compiler from reordering the 2nd load before the store, right? But to my understanding does not prevent a hypothetical CPU reordering of the 2nd load before the store, which seems potentially problematic!)
Then on the GC/reclaim side of things, after protected object pointers are stored, it does a more expensive barrier[0] before acquire-loading the HPs.
I'll admit, I am not confident I understand why this works. I mean, even on x86, loads can be reordered before earlier program-order stores. So it seems like the 2nd check on the protection side could be ineffective. (The non-Linux portable version just uses an atomic_thread_fence SeqCst on both sides, which seems more obviously correct.) And if they don't need the 2nd load on Linux, I'm unclear on why they do it.
[0]: https://github.com/facebook/folly/blob/main/folly/synchroniz...
(This uses either mprotect to force a TLB flush in process-relevant CPUs, or the newer Linux membarrier syscall if available.)
-
Appending to an std:string character-by-character: how does the capacity grow?
folly provides functions to resize std::string & std::vector without initialization [0].
[0] https://github.com/facebook/folly/blob/3c8829785e3ce86cb821c...
-
Can anyone explain feedback of a HFT firm regarding implementation of SPSC lock-free ring-buffer queue?
My implementation was quite similar to Boost's spsc_queue and Facebook's folly/ProducerConsumerQueue.h.
-
A Compressed Indexable Bitset
> How is that relevant?
Roaring bitmaps and similar data structures get their speed from decoding together consecutive groups of elements, so if you do sequential decoding or decode a large fraction of the list you get excellent performance.
EF instead excels at random skipping, so if you visit a small fraction of the list you generally get better performance. This is why it works so well for inverted indexes, as generally the queries are very selective (otherwise why do you need an index?) and if you have good intersection algorithms you can skip a large fraction of documents.
I didn't follow the rest of your comment, select is what EF is good at, every other data structure needs a lot more scanning once you land on the right chunk. With BMI2 you can also use the PDEP instruction to accelerate the final select on a 64-bit block: https://github.com/facebook/folly/blob/main/folly/experiment...
-
Defer for Shell
C++ with folly's SCOPE_EXIT {} construct:
https://github.com/facebook/folly/blob/main/folly/ScopeGuard...
-
Is there any facebook/folly community for discussion and Q&A?
Seems like github issues taking a long time to get any response: https://github.com/facebook/folly
-
How a Single Line of Code Made a 24-Core Server Slower Than a Laptop
Can't speak for abseil and tbb, but in folly there are a few solutions for the common problem of sharing state between a writer that updates it very infrequently and concurrent readers that read it very frequently (typical use case is configs).
The most performant solutions are RCU (https://github.com/facebook/folly/blob/main/folly/synchroniz...) and hazard pointers (https://github.com/facebook/folly/blob/main/folly/synchroniz...), but they're not quite as easy to use as a shared_ptr [1].
Then there is simil-shared_ptr implemented with thread-local counters (https://github.com/facebook/folly/blob/main/folly/experiment...).
If you absolutely need a std::shared_ptr (which can be the case if you're working with pre-existing interfaces) there is CoreCachedSharedPtr (https://github.com/facebook/folly/blob/main/folly/concurrenc...), which uses an aliasing trick to transparently maintain per-core reference counts, and scales linearly, but it works only when acquiring the shared_ptr, any subsequent copies of that would still cause contention if passed around in threads.
[1] Google has a proposal to make a smart pointer based on RCU/hazptr, but I'm not a fan of it because generally RCU/hazptr guards need to be released in the same thread that acquired them, and hiding them in a freely movable object looks like a recipe for disaster to me, especially if paired with coroutines https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p05...
-
Ask HN: What are some of the most elegant codebases in your favorite language?
Not sure if it's still the case but about 6 years ago Facebook's folly C++ library was something I'd point to for my junior engineers to get a sense of "good" C++ https://github.com/facebook/folly
What are some alternatives?
hotspot - The Linux perf GUI for performance analysis.
abseil-cpp - Abseil Common Libraries (C++)
benchmark - A microbenchmark support library
Boost - Super-project for modularized Boost
tracing-bunyan-formatter - A Layer implementation for tokio-rs/tracing providing Bunyan formatting for events and spans.
Seastar - High performance server-side application framework
HeatMap - Heat map generation tools
parallel-hashmap - A family of header-only, very fast and memory-friendly hashmap and btree containers.
node-clinic - Clinic.js diagnoses your Node.js performance issues
EASTL - Obsolete repo, please go to: https://github.com/electronicarts/EASTL
pmu-tools - Intel PMU profiling tools
OpenFrameworks - openFrameworks is a community-developed cross platform toolkit for creative coding in C++.