-
I see a lot of people talk about Prometheus on here and speak about it as though it's the only metrics gathering solution. In this way, it really does seem like it has become the poster child of Hacker News and metrics gathering.
I've used both Prometheus and the Telegraf / StatsD solutions; and, for a very long time, I've disliked everything from the standard "bugs"[0] in Prometheus to the entire design philosophy of their pull vs Telegraf and similar's push methodology.
What is the collective's general stance on Prometheus vs Telegraf; and why does the collective tend to end up preferring one over the other?
[0] For example, Prometheus clients does tend to consider a counter that hasn't been incremented to exist, so if you have an error counter, the sudden existence of the error counter is how you find an error. The 'increase' is 0, though, because it went from not existing to a value of 1. Citation: https://github.com/prometheus/prometheus/issues/1673
No, it's not technically a "bug", it's how it's designed; but, it speaks to how it's used and the work-arounds are unsatisfactory, in my opinion.
-
InfluxDB
InfluxDB – Built for High-Performance Time Series Workloads. InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.
-
While I somehow understand Prometheus idea that pull is easier to scale than push I've had a bad luck with it.
First of all Prometheus doesn't even consider monitoring of long-running jobs other than pull way (which didn't make sense for me). There is push gateway [0] but clients libraries seem to consider it only for short-lived jobs where you can send the metrics at the end [1]. It seems I couldn't "push" from long living jobs trivially
Second when using it for example with django you have to be careful with how you handle multiprocessing that UWSGI/gunicorn does, see [2] - it has bitten me at leas once.
Comparing to push model where I can just push metrics to [3] statsd_exporter directly and be done with it, but support for statsd is lacking both in terms of frameworks (everyone seems to be migrating to native clients...) and functionality (you've to do labeling basically manually [4])
To sum up: Prometheus is really great when it works, until you try to go off-track (intentionally or not, see django [2]) then you see its all undiscovered and immature landscape
[0] https://github.com/prometheus/pushgateway
-
-
[2] https://github.com/korfuri/django-prometheus/blob/master/doc...
-
-
Stream
Stream - Scalable APIs for Chat, Feeds, Moderation, & Video. Stream helps developers build engaging apps that scale to millions with performant and flexible Chat, Feeds, Moderation, and Video APIs and SDKs powered by a global edge network and enterprise-grade infrastructure.
Related posts
-
Autonomous SRE: Revolutionizing Reliability with AI, Automation, and Chaos Engineering
-
Your Essential Toolkit for DevOps & SRE: Mastering Monitoring and Logging
-
The synergy between DevOps and Cloud Computing
-
Monitoring API Requests and Responses for System Health
-
How to Optimize Your Fintech API in 2025: A Guide