A Generic Approach to Troubleshooting

This page summarizes the projects mentioned and recommended in the original post on dev.to

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
  • node_exporter

    Exporter for machine metrics

  • If you have multiple hosts impacted, take one of them as an analysis host before you have a clear view of which metrics are important in the current case. If you have the chance to have metrics exported to a dashboard, for instance through node-exporter, you will be able to see the issue quite easily from the different indicators. Otherwise, you will need to jump on the host and use the usual tools to get more insights into the host's health. In any case, you will be looking for issues with the CPU, RAM, Load, Disk, and Network.

  • prometheus

    The Prometheus monitoring system and time series database.

  • In this article, we consider that you are working on a project that already implements the best practices in terms of monitoring so that you have access to sensible metrics, defined for instance through the USE Method. Those metrics can be used to provide performance dashboards and declare alerts triggered in case of faulty or suspicious behavior. For a more visual reference, the following diagram shows a basic monitoring architecture using Prometheus, Grafana, and Alert Manager.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • pprof

    pprof is a tool for visualization and analysis of profiling data

  • The application performances in a specific code path (e.g. gdb, pprof, …).

  • Grafana

    The open and composable observability and data visualization platform. Visualize metrics, logs, and traces from multiple sources like Prometheus, Loki, Elasticsearch, InfluxDB, Postgres and many more.

  • In this article, we consider that you are working on a project that already implements the best practices in terms of monitoring so that you have access to sensible metrics, defined for instance through the USE Method. Those metrics can be used to provide performance dashboards and declare alerts triggered in case of faulty or suspicious behavior. For a more visual reference, the following diagram shows a basic monitoring architecture using Prometheus, Grafana, and Alert Manager.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts