Our great sponsors
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
Grafana
The open and composable observability and data visualization platform. Visualize metrics, logs, and traces from multiple sources like Prometheus, Loki, Elasticsearch, InfluxDB, Postgres and many more.
If you have multiple hosts impacted, take one of them as an analysis host before you have a clear view of which metrics are important in the current case. If you have the chance to have metrics exported to a dashboard, for instance through node-exporter, you will be able to see the issue quite easily from the different indicators. Otherwise, you will need to jump on the host and use the usual tools to get more insights into the host's health. In any case, you will be looking for issues with the CPU, RAM, Load, Disk, and Network.
In this article, we consider that you are working on a project that already implements the best practices in terms of monitoring so that you have access to sensible metrics, defined for instance through the USE Method. Those metrics can be used to provide performance dashboards and declare alerts triggered in case of faulty or suspicious behavior. For a more visual reference, the following diagram shows a basic monitoring architecture using Prometheus, Grafana, and Alert Manager.
The application performances in a specific code path (e.g. gdb, pprof, …).
In this article, we consider that you are working on a project that already implements the best practices in terms of monitoring so that you have access to sensible metrics, defined for instance through the USE Method. Those metrics can be used to provide performance dashboards and declare alerts triggered in case of faulty or suspicious behavior. For a more visual reference, the following diagram shows a basic monitoring architecture using Prometheus, Grafana, and Alert Manager.