veneur
opentelemetry-specificatio
veneur | opentelemetry-specificatio | |
---|---|---|
2 | 7 | |
1,714 | - | |
0.1% | - | |
3.5 | - | |
about 1 month ago | - | |
Go | ||
MIT License | - |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
veneur
-
OpenTelemetry in 2023
This was the idea behind Stripe's Veneur project - spans, logs, and metrics all in the same format, "automatically" rolling up cardinality as needed - which I thought was cool but also that it would be very hard to get non-SRE developers on board with when I saw a talk about it a few years ago.
https://github.com/stripe/veneur
-
Launch HN: Opstrace (YC S19) – open-source Datadog
One pain point with Prometheus is that is has relatively weak support for quantiles, histograms, and sets[1]:
- Histograms require manually specifying the distribution of your data, which is time-consuming, lossy, and can introduce significant error bands around your quantile estimates.
- Quantiles calculated via the Prometheus "summary" feature are specific to a given host, and not aggregatable, which is almost never what you want (you normally want to see e.g. the 95th percentile value of request latency for all servers of a given type, or all servers within a region). Quantiles can be calculated from histograms instead, but that requires a well-specified histogram and can be expensive at query time.
- As far as I know, Prometheus doesn't have any explicit support for unique sets. You can compute this at query time, but persisting and then querying high-cardinality data in this way is expensive.
Understanding the distribution of your data (rather than just averages) is arguably the most important feature you want from a monitoring dashboard, so the weak support for quantiles is very limiting.
Veneur[2] addresses these use-cases for applications that use DogStatsD[3] by using clever data structures for approximate histograms[4] and approximate sets[5], but I believe its integration with Prometheus is limited and currently only one-way - there is a CLI app to poll Prometheus metrics and push them into Veneur, but there's no output sink for Veneur to write to Prometheus (or expose metrics for a Prometheus instance to poll).
It would be extremely useful to have something similar for Prometheus, either by integrating with Veneur or implementing those data structures as an extension to Prometheus.
[1] https://prometheus.io/docs/practices/histograms/
[2] https://github.com/stripe/veneur
[3] https://docs.datadoghq.com/developers/dogstatsd/
[4] https://github.com/stripe/veneur#approximate-histograms
[5] https://github.com/stripe/veneur#approximate-sets
opentelemetry-specificatio
-
Migrating to OpenTelemetry
Sure, happy to provide more specifics!
Our main issue was the lack of a synchronous gauge. The officially supported asynchronous API of registering a callback function to report a gauge metric is very different from how we were doing things before, and would have required lots of refactoring of our code. Instead, we wrote a wrapper that exposes a synchronous-like API: https://gist.github.com/yolken-airplane/027867b753840f7d15d6....
It seems like this is a common feature request across many of the SDKs, and it's in the process of being fixed in some of them (https://github.com/open-telemetry/opentelemetry-specificatio...)? I'm not sure what the plans are for the golang SDK specifically.
Another, more minor issue, is the lack of support for "constant" attributes that are applied to all metrics. We use these to identify the app, among other use cases, so we added wrappers around the various "Add", "Record", "Observe", etc. calls that automatically add these. (It's totally possible that this is supported and I missed it, in which case please let me know!).
Overall, the SDK was generally well-written and well-documented, we just needed some extra work to make the interfaces more similar to the ones were were using before.
-
OpenTelemetry in 2023
Two problems with OpenTelemetry:
1. It doesn't know what the hell it is. Is it a semantic standard? Is a protocol? It is a facade? What layer of abstraction does it provide? Answer: All of the above! All the things! All the layers!
2. No one from OpenTelemetry has actually tried instrumenting a library. And if they have, they haven't the first suggestion on how instrumenters should actually use metrics, traces, and logs. Do you write to all three? To one? I asked this question two years ago, not a single response. [1]
[1] https://github.com/open-telemetry/opentelemetry-specificatio...
-
Go standard library: structured, leveled logging
That's why you have otel logging: https://github.com/open-telemetry/opentelemetry-specificatio...
-
Monarch: Google’s Planet-Scale In-Memory Time Series Database
There are a large amount of subtle tradeoffs around the bucketing scheme (log, vs. log-linear, base) and memory layout (sparse, dense, chunked) the amount of configurability in the histogram space (circllhist, DDSketch, HDRHistogram, ...). A good overview is this discussion here:
https://github.com/open-telemetry/opentelemetry-specificatio...
As for the circllhist: There are no knobs to turn. It uses base 10 and two decimal digits of precision. In the last 8 years I have not seen a single use-case in the operational domain where this was not appropriate.
-
OpenTelemetry
A good place to look at is the milestones on GitHub: https://github.com/open-telemetry/opentelemetry-specificatio...
Logging is still experimental in the spec. Metrics API is feature freeze and the protocol is stable, so it's more on language SDKs to stabilize their implementations. This is a focus for several of them right now.
What are some alternatives?
opstrace - The Open Source Observability Distribution
skywalking - APM, Application Performance Monitoring System
cortex - A horizontally scalable, highly available, multi-tenant, long term Prometheus.
SLF4J - Simple Logging Facade for Java
Cortex - Cortex: a Powerful Observable Analysis and Active Response Engine
opentelemetry-specification - Specifications for OpenTelemetry
loki - Like Prometheus, but for logs.
semantic-conventions - Defines standards for generating consistent, accessible telemetry across a variety of domains
influxdb-apply - Define InfluxDB users and databases with a yaml file.
jvm-serializers - Benchmark comparing serialization libraries on the JVM
b3-propagation - Repository that describes and sometimes implements B3 propagation
zipkin-api - Zipkin's language independent model and HTTP Api Definitions