-
thanos
Highly available Prometheus setup with long term storage capabilities. A CNCF Incubating project.
I really don't get why "scrape the prometheus endpoint" is a go-to now, push model seems to be way less PITA to manage at scale.
> If you get serious about Prometheus, eventually you will want longer data retention, checkout https://thanos.io/
Any idea how it compares with https://victoriametrics.com/ ?
We're slowly looking for a replacement for InfluxDB (as 1.8 is essentially on life support), the low disk footprint is pretty big advantage here.
-
InfluxDB
Purpose built for real-time analytics at any scale. InfluxDB Platform is powered by columnar analytics, optimized for cost-efficient storage, and built with open data standards.
-
Personally I've also used Apache Skywalking for a decent out of the box experience: https://skywalking.apache.org/
I've also heard good things about Sentry, though if you need to self-host it, then there's a bit of complexity to deal with: https://sentry.io/welcome/
-
I really don't get why "scrape the prometheus endpoint" is a go-to now, push model seems to be way less PITA to manage at scale.
> If you get serious about Prometheus, eventually you will want longer data retention, checkout https://thanos.io/
Any idea how it compares with https://victoriametrics.com/ ?
We're slowly looking for a replacement for InfluxDB (as 1.8 is essentially on life support), the low disk footprint is pretty big advantage here.
-
Scrape is typically just how people get started and works well for small and medium things, it gets you a long way before you need to consider it.
Prometheus remote_write is what people graduate to, this gets you the rest of the way, and you are correct it's less PITA at scale.
If you're looking for retention your choices are large, there's Cortex (CNCF), Mimir (most Cortex work moved here), Thanos, VictoriaMetrics, TimeScale, Chronosphere, and many others.
All seek to do a similar thing from a distance, they all store metrics (likely from Prometheus) and allow retention and some variety of how to query it (if you want SQL you got it, if you want non-standard functions you go it, if your reads are more important than your writes you got it, if you need a billion active series you got it, etc).
If what you want is "Prometheus but bigger" then the Prometheus project provides a compliance suite that you can run to help you evaluate your options: https://github.com/prometheus/compliance
I work for Grafana Labs, and we have maintainers working for us who have touched Prometheus, Thanos, Cortex and Mimir. Mimir is currently the largest investment we have https://github.com/grafana/mimir and it is 100% compliant with Prometheus (though that is about to be temporarily untrue as Native Histograms is landing in Prometheus soon https://github.com/prometheus/prometheus/milestone/10 and we'll need to add a perfectly compliant support to Mimir to get back to being compliant).
-
mimir
Grafana Mimir provides horizontally scalable, highly available, multi-tenant, long-term storage for Prometheus.
Scrape is typically just how people get started and works well for small and medium things, it gets you a long way before you need to consider it.
Prometheus remote_write is what people graduate to, this gets you the rest of the way, and you are correct it's less PITA at scale.
If you're looking for retention your choices are large, there's Cortex (CNCF), Mimir (most Cortex work moved here), Thanos, VictoriaMetrics, TimeScale, Chronosphere, and many others.
All seek to do a similar thing from a distance, they all store metrics (likely from Prometheus) and allow retention and some variety of how to query it (if you want SQL you got it, if you want non-standard functions you go it, if your reads are more important than your writes you got it, if you need a billion active series you got it, etc).
If what you want is "Prometheus but bigger" then the Prometheus project provides a compliance suite that you can run to help you evaluate your options: https://github.com/prometheus/compliance
I work for Grafana Labs, and we have maintainers working for us who have touched Prometheus, Thanos, Cortex and Mimir. Mimir is currently the largest investment we have https://github.com/grafana/mimir and it is 100% compliant with Prometheus (though that is about to be temporarily untrue as Native Histograms is landing in Prometheus soon https://github.com/prometheus/prometheus/milestone/10 and we'll need to add a perfectly compliant support to Mimir to get back to being compliant).
-
Scrape is typically just how people get started and works well for small and medium things, it gets you a long way before you need to consider it.
Prometheus remote_write is what people graduate to, this gets you the rest of the way, and you are correct it's less PITA at scale.
If you're looking for retention your choices are large, there's Cortex (CNCF), Mimir (most Cortex work moved here), Thanos, VictoriaMetrics, TimeScale, Chronosphere, and many others.
All seek to do a similar thing from a distance, they all store metrics (likely from Prometheus) and allow retention and some variety of how to query it (if you want SQL you got it, if you want non-standard functions you go it, if your reads are more important than your writes you got it, if you need a billion active series you got it, etc).
If what you want is "Prometheus but bigger" then the Prometheus project provides a compliance suite that you can run to help you evaluate your options: https://github.com/prometheus/compliance
I work for Grafana Labs, and we have maintainers working for us who have touched Prometheus, Thanos, Cortex and Mimir. Mimir is currently the largest investment we have https://github.com/grafana/mimir and it is 100% compliant with Prometheus (though that is about to be temporarily untrue as Native Histograms is landing in Prometheus soon https://github.com/prometheus/prometheus/milestone/10 and we'll need to add a perfectly compliant support to Mimir to get back to being compliant).
-
self-hosted
Sentry, feature-complete and packaged up for low-volume deployments and proofs-of-concept
> E.g does not allow you to define custom metrics to e.g. monitor resource utilization
I think that might not quite be the case in the latest versions: https://docs.sentry.io/product/performance/metrics/#custom-p...
> In addition to the automatic performance metrics described above, Sentry supports setting custom performance metrics on transactions. Custom performance metrics allow you to define metrics (beyond the ones mentioned above) that are important to your application and send them to Sentry.
> For example, you might want to set a custom metric to track:
> - Total memory usage during a transaction
> - The amount of time being queried
> - Number of times a user performed an action during a transaction
> You define and configure custom metrics in the SDK.
Though for my use cases, Sentry's technical complexity is more of a stumbling block, were I to self-host it: https://github.com/getsentry/self-hosted/blob/master/docker-...
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
coroot
Coroot is an open-source APM & Observability tool, a DataDog and NewRelic alternative 📊, 🖥️, 👉. Powered by eBPF for rapid insights into system performance. Monitor, analyze, and optimize your infrastructure effortlessly for peak reliability at any scale.