A Generic Approach to Troubleshooting

Our great sponsors

InfluxDB - Power Real-Time Data Analytics at Scale

WorkOS - The modern identity platform for B2B SaaS

SaaSHub - Software Alternatives and Reviews

Our great sponsors

node_exporter

78 10,281 8.9 Go

Exporter for machine metrics

If you have multiple hosts impacted, take one of them as an analysis host before you have a clear view of which metrics are important in the current case. If you have the chance to have metrics exported to a dashboard, for instance through node-exporter, you will be able to see the issue quite easily from the different indicators. Otherwise, you will need to jump on the host and use the usual tools to get more insights into the host's health. In any case, you will be looking for issues with the CPU, RAM, Load, Disk, and Network.

prometheus

381 52,642 9.9 Go

The Prometheus monitoring system and time series database.

In this article, we consider that you are working on a project that already implements the best practices in terms of monitoring so that you have access to sensible metrics, defined for instance through the USE Method. Those metrics can be used to provide performance dashboards and declare alerts triggered in case of faulty or suspicious behavior. For a more visual reference, the following diagram shows a basic monitoring architecture using Prometheus, Grafana, and Alert Manager.

InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
pprof

12 7,450 7.6 Go

pprof is a tool for visualization and analysis of profiling data

The application performances in a specific code path (e.g. gdb, pprof, …).

Grafana

379 60,279 10.0 TypeScript

The open and composable observability and data visualization platform. Visualize metrics, logs, and traces from multiple sources like Prometheus, Loki, Elasticsearch, InfluxDB, Postgres and many more.

In this article, we consider that you are working on a project that already implements the best practices in terms of monitoring so that you have access to sensible metrics, defined for instance through the USE Method. Those metrics can be used to provide performance dashboards and declare alerts triggered in case of faulty or suspicious behavior. For a more visual reference, the following diagram shows a basic monitoring architecture using Prometheus, Grafana, and Alert Manager.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project