pprof
jaeger
Our great sponsors
pprof | jaeger | |
---|---|---|
12 | 94 | |
7,362 | 19,243 | |
2.5% | 1.4% | |
7.6 | 9.7 | |
about 24 hours ago | 7 days ago | |
Go | Go | |
Apache License 2.0 | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
pprof
-
Why So Slow? Using Profilers to Pinpoint the Reasons of Performance Degradation
Because we couldn't identify the issue using the results we got from Callgrind, we reached for another profiler, gperftools. It's a sampling profiler and therefor it has a smaller impact on the application's performance in exchange for less accurate call statistics. After filtering out the unimportant parts and visualizing the rest with pprof, it was evident that something strange was happening with the send function. It took only 71 milliseconds with the previous implementation and more than 900 milliseconds with the new implementation of our Bolt server. It was very suspicious, but based on Callgrind, its cost was almost the same as before. We were confused as the two results seemed to conflict with each other.
-
Improving the performance of your code starting with Go
github.com - google/pprof
-
A Generic Approach to Troubleshooting
The application performances in a specific code path (e.g. gdb, pprof, …).
-
Does rust have a visual analysis tool for memory and performance like pprof of golang?
pprof is https://github.com/google/pprof, it's a very useful tool in golang , and really really really convenient
-
Tokio Console
Go also has pretty good out of the box profiling (pprof[0]) and third-party runtime debugging (delv[1]) that can be used both remotely and local.
These tools also have decent editor integration and can be use hand in hand:
https://blog.jetbrains.com/go/2019/04/03/profiling-go-applic...
https://blog.jetbrains.com/go/2020/03/03/how-to-find-gorouti...
Go has pprof (https://github.com/google/pprof), which I've heard good things about --- and, the pprof data model was one of the influences I looked at when designing the Tokio console's wire format. But, I'm not sure if pprof has any similar UIs to the one we've implemented for the Tokio console; and I haven't actually used it all that much.
-
Cats and Clouds – There Are No Pillars in Observability with Yoshi Yamaguchi
And what we do in Google Cloud is that we still use the pprof. But it's a kind of forked version of the pprof because the visualization part is totally different. So we give that tool as the Cloud Profiler. So that is the product name. And then, the difference between the pprof and a Cloud Profiler is that Cloud Profiler provides the agent library for each famous programming language such as Java, Python, Node.js, and Go. And then what you need to do is to just write 5 to 10 lines of code in a new application. That launches the profile agent in your application as a subsidiary thread of the main thread. And then, that thread periodically collects the profile data of the application and then sends that data back to Google Cloud and the Cloud Profiler.
-
Is there a way I can visualize all the function calls made while running the project(C++) in a graphical way?
gprftools (https://github.com/gperftools/gperftools) can be easily plugged in using LD_PRELOAD and signal, and has nice go implemented visualization tool https://github.com/google/pprof.
Would pprof's visualizations (especially the tree graph) work for you?
jaeger
-
Show HN: An open source performance monitoring tool
As engineers at past startups, we often had to debug slow queries, poor load times, inconsistent errors, etc... While tools like Jaegar [2] helped us inspect server-side performance, we had no way to tie user events to the traces we were inspecting. In other words, although we had an idea of what API route was slow, there wasn’t much visibility into the actual bottleneck.
This is where our performance product comes in: we’re rethinking a tracing/performance tool that focuses on bridging the gap between the client and server.
What’s unique about our approach is that we lean heavily into creating traces from the frontend. For example, if you’re using our Next.js SDK, we automatically connect browser HTTP requests with server-side code execution, all from the perspective of a user. We find this much more powerful because you can understand what part of your frontend codebase causes a given trace to occur. There’s an example here [3].
From an instrumentation perspective, we’ve built our SDKs on-top of OTel, so you can create custom spans to expand highlight-created traces in server routes that will transparently roll up into the flame graph you see in our UI. You can also send us raw OTel traces and manually set up the client-server connection if you want. [4] Here’s an example of what a trace looks like with a database integration using our Golang GORM SDK, triggered by a frontend GraphQL query [5] [6].
In terms of how it's built, we continue to rely heavily on ClickHouse as our time-series storage engine. Given that traces require that we also query based on an ID for specific groups of spans (more akin to an OLTP db), we’ve leveraged the power of CH materialized views to make these operations efficient (described here [7]).
To try it out, you can spin up the project with our self hosted docs [8] or use our cloud offering at app.highlight.io. The entire stack runs in docker via a compose file, including an OpenTelemetry collector for data ingestion. You’ll need to point your SDK to export data to it by setting the relevant OTLP endpoint configuration (ie. environment variable OTEL_EXPORTER_OTLP_LOGS_ENDPOINT [9]).
Overall, we’d really appreciate feedback on what we’re building here. We’re also all ears if anyone has opinions on what they’d like to see in a product like this!
[1] https://github.com/highlight/highlight/blob/main/LICENSE
[2] https://www.jaegertracing.io
[3] https://app.highlight.io/1383/sessions/COu90Th4Qc3PVYTXbx9Xe...
[4] https://www.highlight.io/docs/getting-started/native-opentel...
[5] https://static.highlight.io/assets/docs/gorm.png
[6] https://github.com/highlight/highlight/blob/1fc9487a676409f1...
[7] https://highlight.io/blog/clickhouse-materialized-views
[8] https://www.highlight.io/docs/getting-started/self-host/self...
[9] https://opentelemetry.io/docs/concepts/sdk-configuration/otl...
-
Kubernetes Ingress Visibility
For the request following, something like jeager https://www.jaegertracing.io/, because you are talking more about tracing than necessarily logging. For just monitoring, https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack would be the starting point, then it depends. Nginx gives metrics out of the box, then you can pull in the dashboard like https://grafana.com/grafana/dashboards/14314-kubernetes-nginx-ingress-controller-nextgen-devops-nirvana/ , or full metal with something like service mesh monitoring which would provably fulfil most of the requirements
-
Migrating to OpenTelemetry
Have you checked out Jaeger [1]? It is lightweight enough for a personal project, but featureful enough to really help "turn on the lightbulb" with other engineers to show them the difference between logging/monitoring and tracing.
-
The Road to GraphQL At Enterprise Scale
From the perspective of the realization of GraphQL infrastructure, the interesting direction is "Finding". How to find the problem? How to find the bottleneck of the system? Distributed Tracing System (DTS) will help answer this question. Distributed tracing is a method of observing requests as they propagate through distributed environments. In our scenario, we have dozens of subgraphs, gateway, and transport layer through which the request goes. We have several tools that can be used to detect the whole lifecycle of the request through the system, e.g. Jaeger, Zipkin or solutions that provided DTS as a part of the solution NewRelic.
-
OpenTelemetry Exporters - Types and Configuration Steps
Jaeger is an open-source, distributed tracing system that monitors and troubleshoots the flow of requests through complex, microservices-based applications, providing a comprehensive view of system interactions.
-
Fault Tolerance in Distributed Systems: Strategies and Case Studies
However, ensuring fault tolerance in distributed systems is not at all easy. These systems are complex, with multiple nodes or components working together. A failure in one node can cascade across the system if not addressed timely. Moreover, the inherently distributed nature of these systems can make it challenging to pinpoint the exact location and cause of fault - that is why modern systems rely heavily on distributed tracing solutions pioneered by Google Dapper and widely available now in Jaeger and OpenTracing. But still, understanding and implementing fault tolerance becomes not just about addressing the failure but predicting and mitigating potential risks before they escalate.
-
Observability in Action Part 3: Enhancing Your Codebase with OpenTelemetry
In this article, we'll use HoneyComb.io as our tracing backend. While there are other tools in the market, some of which can be run on your local machine (e.g., Jaeger), I chose HoneyComb because of their complementary tools that offer improved monitoring of the service and insights into its behavior.
-
Distributed Tracing and OpenTelemetry Guide
In this example, I will create 3 Node.js services (shipping, notification, and courier) using Amplication, add traces to all services, and show how to analyze trace data using Jaeger.
-
Event-Driven Architecture 101
For example, without investment, visibility into system behavior as a whole can be much more difficult in an event-driven system. Investing in something like Open Telemetry and a service catalog is a good idea. Getting started with these things are relatively simple, but if you want to store your traces somewhere that are searchable, you are going to have to either pay for a SaaS tool that ingests them or you are going to have to run and maintain an open source tool capable of this such as Jaeger. For service cataloging, Backstage is becoming a very popular option. Depending on the capabilities and the capacity of your engineering team, this might be a good option and many companies do have platform teams that provide tooling such as this. With the average salary of a platform Engineer being ~$144k, companies should think carefully on whether the benefits of an EDA are going to outweigh the cost. We will dig deeper into this in part 2 and 3 of the series.
-
The Complete Microservices Guide
Distributed Tracing: Middleware for distributed tracing like Jaeger and Zipkin helps monitor and trace requests as they flow through multiple microservices, aiding in debugging, performance optimization, and understanding the system's behavior.
What are some alternatives?
Sentry - Developer-first error tracking and performance monitoring
skywalking - APM, Application Performance Monitoring System
prometheus - The Prometheus monitoring system and time series database.
signoz - SigNoz is an open-source observability platform native to OpenTelemetry with logs, traces and metrics in a single application. An open-source alternative to DataDog, NewRelic, etc. 🔥 🖥. 👉 Open source Application Performance Monitoring (APM) & Observability tool
Pinpoint - APM, (Application Performance Management) tool for large-scale distributed systems.
fluent-bit - Fast and Lightweight Logs and Metrics processor for Linux, BSD, OSX and Windows
hypertrace - An open source distributed tracing & observability platform
VictoriaMetrics - VictoriaMetrics: fast, cost-effective monitoring solution and time series database
Grafana - The open and composable observability and data visualization platform. Visualize metrics, logs, and traces from multiple sources like Prometheus, Loki, Elasticsearch, InfluxDB, Postgres and many more.
Gin - Gin is a HTTP web framework written in Go (Golang). It features a Martini-like API with much better performance -- up to 40 times faster. If you need smashing performance, get yourself some Gin.
zipkin - Zipkin is a distributed tracing system
apm-server - APM Server