Monitoring Microservices with Prometheus and Grafana

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • thanos

    Highly available Prometheus setup with long term storage capabilities. A CNCF Incubating project.

  • I really don't get why "scrape the prometheus endpoint" is a go-to now, push model seems to be way less PITA to manage at scale.

    > If you get serious about Prometheus, eventually you will want longer data retention, checkout https://thanos.io/

    Any idea how it compares with https://victoriametrics.com/ ?

    We're slowly looking for a replacement for InfluxDB (as 1.8 is essentially on life support), the low disk footprint is pretty big advantage here.

  • skywalking

    APM, Application Performance Monitoring System

  • Personally I've also used Apache Skywalking for a decent out of the box experience: https://skywalking.apache.org/

    I've also heard good things about Sentry, though if you need to self-host it, then there's a bit of complexity to deal with: https://sentry.io/welcome/

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • VictoriaMetrics

    VictoriaMetrics: fast, cost-effective monitoring solution and time series database

  • I really don't get why "scrape the prometheus endpoint" is a go-to now, push model seems to be way less PITA to manage at scale.

    > If you get serious about Prometheus, eventually you will want longer data retention, checkout https://thanos.io/

    Any idea how it compares with https://victoriametrics.com/ ?

    We're slowly looking for a replacement for InfluxDB (as 1.8 is essentially on life support), the low disk footprint is pretty big advantage here.

  • compliance

    A set of tests to check compliance with various Prometheus interfaces

  • Scrape is typically just how people get started and works well for small and medium things, it gets you a long way before you need to consider it.

    Prometheus remote_write is what people graduate to, this gets you the rest of the way, and you are correct it's less PITA at scale.

    If you're looking for retention your choices are large, there's Cortex (CNCF), Mimir (most Cortex work moved here), Thanos, VictoriaMetrics, TimeScale, Chronosphere, and many others.

    All seek to do a similar thing from a distance, they all store metrics (likely from Prometheus) and allow retention and some variety of how to query it (if you want SQL you got it, if you want non-standard functions you go it, if your reads are more important than your writes you got it, if you need a billion active series you got it, etc).

    If what you want is "Prometheus but bigger" then the Prometheus project provides a compliance suite that you can run to help you evaluate your options: https://github.com/prometheus/compliance

    I work for Grafana Labs, and we have maintainers working for us who have touched Prometheus, Thanos, Cortex and Mimir. Mimir is currently the largest investment we have https://github.com/grafana/mimir and it is 100% compliant with Prometheus (though that is about to be temporarily untrue as Native Histograms is landing in Prometheus soon https://github.com/prometheus/prometheus/milestone/10 and we'll need to add a perfectly compliant support to Mimir to get back to being compliant).

  • mimir

    Grafana Mimir provides horizontally scalable, highly available, multi-tenant, long-term storage for Prometheus.

  • Scrape is typically just how people get started and works well for small and medium things, it gets you a long way before you need to consider it.

    Prometheus remote_write is what people graduate to, this gets you the rest of the way, and you are correct it's less PITA at scale.

    If you're looking for retention your choices are large, there's Cortex (CNCF), Mimir (most Cortex work moved here), Thanos, VictoriaMetrics, TimeScale, Chronosphere, and many others.

    All seek to do a similar thing from a distance, they all store metrics (likely from Prometheus) and allow retention and some variety of how to query it (if you want SQL you got it, if you want non-standard functions you go it, if your reads are more important than your writes you got it, if you need a billion active series you got it, etc).

    If what you want is "Prometheus but bigger" then the Prometheus project provides a compliance suite that you can run to help you evaluate your options: https://github.com/prometheus/compliance

    I work for Grafana Labs, and we have maintainers working for us who have touched Prometheus, Thanos, Cortex and Mimir. Mimir is currently the largest investment we have https://github.com/grafana/mimir and it is 100% compliant with Prometheus (though that is about to be temporarily untrue as Native Histograms is landing in Prometheus soon https://github.com/prometheus/prometheus/milestone/10 and we'll need to add a perfectly compliant support to Mimir to get back to being compliant).

  • prometheus

    The Prometheus monitoring system and time series database.

  • Scrape is typically just how people get started and works well for small and medium things, it gets you a long way before you need to consider it.

    Prometheus remote_write is what people graduate to, this gets you the rest of the way, and you are correct it's less PITA at scale.

    If you're looking for retention your choices are large, there's Cortex (CNCF), Mimir (most Cortex work moved here), Thanos, VictoriaMetrics, TimeScale, Chronosphere, and many others.

    All seek to do a similar thing from a distance, they all store metrics (likely from Prometheus) and allow retention and some variety of how to query it (if you want SQL you got it, if you want non-standard functions you go it, if your reads are more important than your writes you got it, if you need a billion active series you got it, etc).

    If what you want is "Prometheus but bigger" then the Prometheus project provides a compliance suite that you can run to help you evaluate your options: https://github.com/prometheus/compliance

    I work for Grafana Labs, and we have maintainers working for us who have touched Prometheus, Thanos, Cortex and Mimir. Mimir is currently the largest investment we have https://github.com/grafana/mimir and it is 100% compliant with Prometheus (though that is about to be temporarily untrue as Native Histograms is landing in Prometheus soon https://github.com/prometheus/prometheus/milestone/10 and we'll need to add a perfectly compliant support to Mimir to get back to being compliant).

  • self-hosted

    Sentry, feature-complete and packaged up for low-volume deployments and proofs-of-concept

  • > E.g does not allow you to define custom metrics to e.g. monitor resource utilization

    I think that might not quite be the case in the latest versions: https://docs.sentry.io/product/performance/metrics/#custom-p...

    > In addition to the automatic performance metrics described above, Sentry supports setting custom performance metrics on transactions. Custom performance metrics allow you to define metrics (beyond the ones mentioned above) that are important to your application and send them to Sentry.

    > For example, you might want to set a custom metric to track:

    > - Total memory usage during a transaction

    > - The amount of time being queried

    > - Number of times a user performed an action during a transaction

    > You define and configure custom metrics in the SDK.

    Though for my use cases, Sentry's technical complexity is more of a stumbling block, were I to self-host it: https://github.com/getsentry/self-hosted/blob/master/docker-...

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  • coroot

    Coroot is an open-source APM & Observability tool, a DataDog and NewRelic alternative 📊, 🖥️, 👉. Powered by eBPF for rapid insights into system performance. Monitor, analyze, and optimize your infrastructure effortlessly for peak reliability at any scale.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • Tools for frontend monitoring with Prometheus

    6 projects | dev.to | 9 Apr 2024
  • Show HN: OneUptime – open-source Datadog Alternative

    7 projects | news.ycombinator.com | 2 Apr 2024
  • 4 facets of API monitoring you should implement

    3 projects | dev.to | 2 Mar 2024
  • Root Cause Chronicles: Quivering Queue

    5 projects | dev.to | 16 Jan 2024
  • Start your server remotely

    2 projects | /r/selfhosted | 11 Dec 2023