amzn-drivers
FlameGraph
amzn-drivers | FlameGraph | |
---|---|---|
4 | 53 | |
441 | 16,438 | |
0.7% | - | |
9.1 | 4.5 | |
17 days ago | 19 days ago | |
C | Perl | |
- | - |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
amzn-drivers
-
Looking for programmer volunteers who want to contribute/learn about low level C++, Linux, Networking, high frequency trading.
Amazon (AWS) cloud EC2 instance specific role (Kernel and User space networking, linux OS related). Amazon has it's own network card with it's own linux driver (open source), for user space they use DPDK (open source). https://github.com/amzn/amzn-drivers I've measured the time between calling tcp send in software, and packet leaving the NIC (network card), it is around ~50 microseconds latency, aws also stated in a paper it is around that number. Goals:- Figure out the way to build from source code and load the kernel.- Reduce latency
-
FreeBSD optimizations used by Netflix to serve video at 800Gb/s [pdf]
It means, for example, writing a FreeBSD kernel driver for Elastic Network Adapter (ENA). Both Linux kernel driver and FreeBSD kernel driver is available at https://github.com/amzn/amzn-drivers
-
Dragonflydb – A modern replacement for Redis and Memcached
Of course, there are.
I was mostly running on AWS. In terms of hardware, for small packets loadtests most systems are constrained on throughput, i.e. number of packets per second. Some systems saturate on interrupts reaching 100% CPU on all cores and some can not even saturate the CPU and you will see that CPU is at 60% but you can not go beyond some limit. Best systems networkwise are c6gn family types. They are also better than other cloud provide. btw, you mentioned hypervisors... About 8 months ago I opened a bug on AWS Graviton team https://github.com/amzn/amzn-drivers/issues/195 - about performance issue they had on their instances at high throughput. Recently they issued the fix. I suspect it was in their hypervisor.
In terms of my software I found many performance bugs at those speeds. For example, using a default allocator is a big no. I use mimalloc for uncontended allocations. In general, you can not use mutexes and spinlocks at those speeds. Those will just cripple the system. Sometimes it can be very annoying since you can not rely on a 3rd party library without carefully analyzing its design. For example, I could not use openmetrics c++ library because it was not performant enough. Even to implement a simple counter, say to gather statistics for INFO command becomes an interesting engineering problem:
-
Ask HN: Anybody enabled IOMMU on AWS metal servers?
https://doc.dpdk.org/guides/nics/ena.html
and:
https://github.com/amzn/amzn-drivers/tree/master/userspace/dpdk/enav2-vfio-patch
Enabling IOMMU on i3 or c5 metal instances is as easy as adding "iommu=1 intel_iommu=on" to /etc/default/grub followed by update-grub, reboot.
I can't get this to work. Everything I update grub and reboot I cannot re-connected via ssh. Also EC2 console fails to get good status.
My config:
Ubuntu 20.04 stock AWS AMI x86 64-bit
FlameGraph
-
JVM Profiling in Action
We'll use async-profiler and flame graphs for profiling. To simplify the process, we'll run the code using JBang.
-
Memray – A Memory Profiler for Python
And flame graphs excel and this kind of thing
https://www.brendangregg.com/flamegraphs.html
-
All my favorite tracing tools: eBPF, QEMU, Perfetto, new ones I built and more
which can output in a format understood by Brendan Gregg's flame frames (https://www.brendangregg.com/flamegraphs.html)
But that's not quite the kind of tracing you're talking about. We also built a printf-style interface to our recording files, which seems closer:
-
Recap of Werner Vogels' Keynote at re:Invent 2023
Strategies included discontinuing or resizing underutilized services, transitioning to more cost-effective solutions, reducing the current resources to the amount of resources that we need for our application, and conducting detailed analyses of computing resource utilization through tools like flamegraphs. This detailed scrutiny helped identify and rectify significant cost-driving areas, such as garbage collection and application configurations.
-
Pinpoint performance regressions with CI-Integrated differential profiling
Flame Graphs by Brendan Gregg
-
Flameshow: A Terminal Flamegraph Viewer
Historically brendangregg's since AIUI he basically invented flamegraphs
https://www.brendangregg.com/flamegraphs.html
So if you can make your tool eat whatever https://github.com/brendangregg/FlameGraph is fed with you're going to support a lot of existing tooling across OSes and languages.
-
Introducing Flame graphs: It’s getting hot in here
“Flame graphs are a visualization of hierarchical data, created to visualize stack traces of profiled software so that the most frequent code-paths to be identified quickly and accurately.”
-
Using SVG to create simple sparkline charts
SVGs are amazing for interactive visualisation too. Like Flamegraphs: https://www.brendangregg.com/flamegraphs.html
-
Good example of using flame graphs to speed up java code (50x improvement)
This may be a good example of the application of a flame graph but it is not a good demonstration of flame graphs; the graph is nearly incidental. The source has an actual explanation.
-
Intro to PostGraphile V5 (Part 1): Replacing the Foundations
A profiling flame graph from Graphile Crystal (a precursor to Grafast) using GraphQL.js' executor (each tick is 1ms, total: 29ms). As we removed more and more responsibilities from GraphQL.js, we ended up only using it for output. Replacing this final responsibility with a custom implementation in Graphile Crystal itself, we reduced execution time for this query down to 15.5ms (effectively removing the majority of the yellow portion of the flame graph).
What are some alternatives?
dragonfly - A modern replacement for Redis and Memcached
hotspot - The Linux perf GUI for performance analysis.
neon - Neon: Serverless Postgres. We separated storage and compute to offer autoscaling, branching, and bottomless storage.
benchmark - A microbenchmark support library
cachegrand - cachegrand - a modern data ingestion, processing and serving platform built for today's hardware
tracing-bunyan-formatter - A Layer implementation for tokio-rs/tracing providing Bunyan formatting for events and spans.
helio - A modern framework for backend development based on io_uring Linux interface
HeatMap - Heat map generation tools
midi-redis - A toy memory store with great performance
node-clinic - Clinic.js diagnoses your Node.js performance issues
webdis - A Redis HTTP interface with JSON output
pmu-tools - Intel PMU profiling tools