OpenTelemetry in 2023

Our great sponsors

WorkOS - The modern identity platform for B2B SaaS

InfluxDB - Power Real-Time Data Analytics at Scale

SaaSHub - Software Alternatives and Reviews

Our great sponsors

tempo

7 3,630 9.7 Go

Grafana Tempo is a high volume, minimal dependency distributed tracing backend.

> It's easy to add Jaeger to your local dev stack so you can have tracing while developing.
Tempo can be spun up with docker compose using a local disk for ephemeral storage/querying: https://github.com/grafana/tempo/blob/main/example/docker-co...
Maybe this meets your needs?
> Jaeger is easier to setup/manage and has a better interface than Grafana/Tempo
What do you enjoy about the Jaeger interface? Perhaps it's a gap in Tempo we can improve.

signoz

310 16,886 9.9 TypeScript

SigNoz is an open-source observability platform native to OpenTelemetry with logs, traces and metrics in a single application. An open-source alternative to DataDog, NewRelic, etc. 🔥 🖥. 👉 Open source Application Performance Monitoring (APM) & Observability tool

Thanks for mentioning SigNoz, I am one of the maintainers at SigNoz and would love your feedback on how we can improve it further.
If anyone wants to check our project, here’s our GitHub repo - https://github.com/SigNoz/signoz

WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
oteps

4 316 5.3 Makefile

OpenTelemetry Enhancement Proposals

Oh nice, thank you (and also solumos) for the links! It looks like oteps/pull/171 (merged June 2023) expanded and superseded the opentelemetry-proto/pull/346 PR (closed Jul 2022) [0]. The former resulted in merging OpenTelemetry Enhancement Proposal 156 [1], with some interesting results especially for 'Phase 2' where they implemented columnar storage end-to-end (see the Validation section [2]):
* For univariate time series, OTel Arrow is 2 to 2.5 better in terms of bandwidth reduction ... and the end-to-end speed is 3.1 to 11.2 times faster
* For multivariate time series, OTel Arrow is 3 to 7 times better in terms of bandwidth reduction ... Phase 2 has [not yet] been .. estimated but similar results are expected.
* For logs, OTel Arrow is 1.6 to 2 times better in terms of bandwidth reduction ... and the end-to-end speed is 2.3 to 4.86 times faster
* For traces, OTel Arrow is 1.7 to 2.8 times better in terms of bandwidth reduction ... and the end-to-end speed is 3.37 to 6.16 times faster
[0]: https://github.com/open-telemetry/opentelemetry-proto/pull/3...
[1]: https://github.com/open-telemetry/oteps/blob/main/text/0156-...
[2]: https://github.com/open-telemetry/oteps/blob/main/text/0156-...

opentelemetry-proto

8 524 8.0 Makefile

OpenTelemetry protocol (OTLP) specification and Protobuf definitions

Oh nice, thank you (and also solumos) for the links! It looks like oteps/pull/171 (merged June 2023) expanded and superseded the opentelemetry-proto/pull/346 PR (closed Jul 2022) [0]. The former resulted in merging OpenTelemetry Enhancement Proposal 156 [1], with some interesting results especially for 'Phase 2' where they implemented columnar storage end-to-end (see the Validation section [2]):
* For univariate time series, OTel Arrow is 2 to 2.5 better in terms of bandwidth reduction ... and the end-to-end speed is 3.1 to 11.2 times faster
* For multivariate time series, OTel Arrow is 3 to 7 times better in terms of bandwidth reduction ... Phase 2 has [not yet] been .. estimated but similar results are expected.
* For logs, OTel Arrow is 1.6 to 2 times better in terms of bandwidth reduction ... and the end-to-end speed is 2.3 to 4.86 times faster
* For traces, OTel Arrow is 1.7 to 2.8 times better in terms of bandwidth reduction ... and the end-to-end speed is 3.37 to 6.16 times faster
[0]: https://github.com/open-telemetry/opentelemetry-proto/pull/3...
[1]: https://github.com/open-telemetry/oteps/blob/main/text/0156-...
[2]: https://github.com/open-telemetry/oteps/blob/main/text/0156-...

opentelemetry-go

127 4,765 9.6 Go

OpenTelemetry Go API and SDK

https://opentelemetry.io
> OpenTelemetry is a collection of APIs, SDKs, and tools. Use it to instrument, generate, collect, and export telemetry data (metrics, logs, and traces) to help you analyze your software’s performance and behavior.
You can absolutely categorize telemetry into these high-level categories, true. But the specifics on how that data is captured, exported, collected, queried, etc. is necessarily unique to each programming language, backend system, organization, etc.
That's because telemetry data is always larger than the original data it represents: a production request will be of some well-defined size, but the metadata about that request is potentially infinite. Consequently, the main design constraint for telemetry systems is always efficiency.
Efficiency requires specialization, which is in direct tension with features that generalize over backends and tools, e.g.
> Traces, Metrics, Logs -- Create and collect telemetry data from your services and software, then forward them to a variety of analysis tools.
and features that generalize over languages, e.g.
> Drop-In Instrumentation -- OpenTelemetry integrates with popular libraries and frameworks such as Spring, ASP.NET Core, Express, Quarkus, and more! Installation and integration can be as simple as a few lines of code.
I think OTel treats these goals -- which are very valuable to end users!! -- as inviolable core requirements, and then does whatever is necessary to implement them, even if the resulting code is unsound, or inefficient, or incoherent.

Grafana

379 60,279 10.0 TypeScript

The open and composable observability and data visualization platform. Visualize metrics, logs, and traces from multiple sources like Prometheus, Loki, Elasticsearch, InfluxDB, Postgres and many more.

Grafana seems to be an option? Handles metrics, logs and traces. I don't know what storage costs look like though if you are self hosting.. https://grafana.com/

terraform-aws-jaeger

1 8 10.0 HCL

Terraform module for Jeager

It's really not that intense. I basically set up my last co's telemetry infrastructure all by myself, using terraform, otel-python, jaeger, and AWS elasticsearch.
This TF project does most of the heavy lift. https://github.com/telia-oss/terraform-aws-jaeger

InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
openobserve

38 9,368 9.9 Rust

🚀 10x easier, 🚀 140x lower storage cost, 🚀 high performance, 🚀 petabyte scale - Elasticsearch/Splunk/Datadog alternative for 🚀 (logs, metrics, traces, RUM, Error tracking, Session replay).

I guess you could take a look at this: https://openobserve.ai/
It's in Rust to add some HN catnip.

opentelemetry-js

16 2,456 9.4 TypeScript

OpenTelemetry JavaScript Client

> OpenTelemetry is a marketing-driven project, designed by committee, implemented naively and inefficiently, and guided by the primary goal of allowing Fortune X00 CTOs to tick off some boxes on their strategy roadmap documents.
I'm the founder of highlight.io. On the consumer side as a company, we've seen a lot of value of from OTEL; we've used it to build out language support for quite a few customers at this point, and the community is very receptive.
Here's an example of us putting up a change: https://github.com/open-telemetry/opentelemetry-js/pull/4049
Do you mind sharing why you think no-one should be using it? Some reasoning would be nice.

opentelemetry-js-contrib

8 597 9.5 TypeScript

OpenTelemetry instrumentation for JavaScript modules

[2] https://github.com/open-telemetry/opentelemetry-js-contrib/t...

jaeger-tempo

1 4 10.0 Go

Discontinued Tempo Proxy with Jaeger

Jaeger can use multiple backends for storage, including Tempo, so it's not an either/or situation.
I'm fairly sure there was an official Grafana-provided Jaeger gRPC plugin for Tempo, but can't easily find it, only this one: https://github.com/flitnetics/jaeger-tempo

proposal-async-context

3 449 6.3 HTML

Async Context for JavaScript

You can follow [0] which is currently stage 2 to fix this
[0]: https://github.com/tc39/proposal-async-context

VictoriaMetrics

97 10,826 9.9 Go

VictoriaMetrics: fast, cost-effective monitoring solution and time series database

You shouldn't unless you want to use the new open source standard for telemetry. You won't benefit from simplicity or performance improvements. It would be quite the opposite. You can check what is the actual cost of open telemetry adoption here [0]
But if you ever decide to go this path - VictoriaMetrics supports OpenTelemetry protocol for metrics [1]
[0] https://github.com/VictoriaMetrics/VictoriaMetrics/pull/2570
[1] https://docs.victoriametrics.com/Single-server-VictoriaMetri...

self-hosted

28 7,264 9.0 Shell

Sentry, feature-complete and packaged up for low-volume deployments and proofs-of-concept

> What should people use?
I recall Apache Skywalking being pretty good, especially for smaller/medium scale projects: https://skywalking.apache.org/
The architecture is simple, the performance is adequate, it doesn't make you spend days configuring it and it even supports various different data stores: https://skywalking.apache.org/docs/main/v9.0.0/en/setup/back...
The problems with it are that it isn't super popular (although has agents for most popular stacks), the docs could be slightly better and I recall them also working on a new UI so there is a little bit of churn: https://skywalking.apache.org/downloads/
Still better versus some of the other options when you need something that just works instead of spending a lot of time configuring something (even when that something might be superior in regards to the features): https://github.com/getsentry/self-hosted/blob/master/docker-...
Sentry is just the first thing that comes to mind (OpenTelemetry also isn't simpler due to how much it tries to do), but compare its complexity to Skywalking: https://github.com/apache/skywalking/blob/master/docker/dock...
I wish there was more self-hosted software like that out there, enough to address certain concerns in a simple way on day 1 and leave branching out to more complex options like OpenTelemetry once you have a separate team for that and the cash is rolling in.

skywalking

23 23,269 9.5 Java

APM, Application Performance Monitoring System

> What should people use?
I recall Apache Skywalking being pretty good, especially for smaller/medium scale projects: https://skywalking.apache.org/
The architecture is simple, the performance is adequate, it doesn't make you spend days configuring it and it even supports various different data stores: https://skywalking.apache.org/docs/main/v9.0.0/en/setup/back...
The problems with it are that it isn't super popular (although has agents for most popular stacks), the docs could be slightly better and I recall them also working on a new UI so there is a little bit of churn: https://skywalking.apache.org/downloads/
Still better versus some of the other options when you need something that just works instead of spending a lot of time configuring something (even when that something might be superior in regards to the features): https://github.com/getsentry/self-hosted/blob/master/docker-...
Sentry is just the first thing that comes to mind (OpenTelemetry also isn't simpler due to how much it tries to do), but compare its complexity to Skywalking: https://github.com/apache/skywalking/blob/master/docker/dock...
I wish there was more self-hosted software like that out there, enough to address certain concerns in a simple way on day 1 and leave branching out to more complex options like OpenTelemetry once you have a separate team for that and the cash is rolling in.

aws-otel-lambda

4 133 8.9 HCL

AWS Distro for OpenTelemetry - AWS Lambda

OpenTelemetry is being pushed as a replacement for AWS X-Ray SDKs by AWS, but it's in such a broken state for Lambda right now. A 200-500% performance penalty for using it is insane[1][2].
[1]: https://github.com/aws-observability/aws-otel-lambda/issues/...

opentelemetry-lambda

8 243 9.3 Go

Create your own Lambda Layer in each OTel language using this starter code. Add the Lambda Layer to your Lamdba Function to get tracing with OpenTelemetry.
opentelemetry-specification

99 3,602 9.2 Makefile

Specifications for OpenTelemetry

Two problems with OpenTelemetry:
1. It doesn't know what the hell it is. Is it a semantic standard? Is a protocol? It is a facade? What layer of abstraction does it provide? Answer: All of the above! All the things! All the layers!
2. No one from OpenTelemetry has actually tried instrumenting a library. And if they have, they haven't the first suggestion on how instrumenters should actually use metrics, traces, and logs. Do you write to all three? To one? I asked this question two years ago, not a single response. [1]
[1] https://github.com/open-telemetry/opentelemetry-specificatio...

opentelemetry-specificatio

7 - -

Two problems with OpenTelemetry:
1. It doesn't know what the hell it is. Is it a semantic standard? Is a protocol? It is a facade? What layer of abstraction does it provide? Answer: All of the above! All the things! All the layers!
2. No one from OpenTelemetry has actually tried instrumenting a library. And if they have, they haven't the first suggestion on how instrumenters should actually use metrics, traces, and logs. Do you write to all three? To one? I asked this question two years ago, not a single response. [1]
[1] https://github.com/open-telemetry/opentelemetry-specificatio...

community

7 693 9.0 Python

OpenTelemetry community content (by open-telemetry)

1. Agreed. It's the sink and the house attached to it, and the docs are thin and confusing as a result.
2. I had a similar experience to you. I wanted to implement a simple heartbeat in our app to get an idea of usage numbers. This is surprisingly not possible, which greatly confuses me given the name of the project. The low engagement on my question put me off and I abandoned my OpenTelemetry planning completely. [1][2]
[1] https://github.com/open-telemetry/community/discussions/1598

semantic-conventions

1 182 9.6 Roff

Defines standards for generating consistent, accessible telemetry across a variety of domains

[2] https://github.com/open-telemetry/semantic-conventions/issue...

proposal-explicit-resource-management

22 698 6.1 JavaScript

ECMAScript Explicit Resource Management

In addition to this, is the new (stage 3 even!)explicit resource management proposal[0], supported by TypeScript version >= 5.2[1]
Though I agree that async context is better fit for this generally, the RMP should be good for telemetry around objects that have defined lifetime semantics, which is a step in the right direction you can use today
[0]: https://github.com/tc39/proposal-explicit-resource-managemen...
[1]: https://www.totaltypescript.com/typescript-5-2-new-keyword-u...

proposal-explicit-resource-managemen

10 - -

In addition to this, is the new (stage 3 even!)explicit resource management proposal[0], supported by TypeScript version >= 5.2[1]
Though I agree that async context is better fit for this generally, the RMP should be good for telemetry around objects that have defined lifetime semantics, which is a step in the right direction you can use today
[0]: https://github.com/tc39/proposal-explicit-resource-managemen...
[1]: https://www.totaltypescript.com/typescript-5-2-new-keyword-u...

veneur

2 1,714 3.5 Go

A distributed, fault-tolerant pipeline for observability data

This was the idea behind Stripe's Veneur project - spans, logs, and metrics all in the same format, "automatically" rolling up cardinality as needed - which I thought was cool but also that it would be very hard to get non-SRE developers on board with when I saw a talk about it a few years ago.
https://github.com/stripe/veneur

odigos

40 3,020 9.8 Go

Distributed tracing without code changes. 🚀 Instantly monitor any application using OpenTelemetry and eBPF

Disclaimer: I am one of the maintainers
Many comments complain about the complexity of using OpenTelemetry, I recommend checking out Odigos, an open-source project which makes working with OpenTelemetry much easier: https://github.com/keyval-dev/odigos
We combine OpenTelemetry and eBPF to instantly generate distributed traces without any code changes.

b3-propagation

3 514 2.7

Repository that describes and sometimes implements B3 propagation

I've been playing with OTEL for a while, with a few backends like Jaeger and Zipkin, and am trying to figure out a way to perform end to end timing measurements across a graph of services triggered by any of several events.
Consider this scenario: There is a collection of services that talk to one another, and not all use HTTP. Say agent A0 makes a connection to agent A1, this is observed by service S0 which triggers service S1 to make calls to S2 and S3, which propagate elsewhere and return answers.
If we limit the scope of this problem to services explicitly making HTTP calls to other services, we can easily use the Propagators API [1] and use X-B3 headers [2] to propagate the trace context (trace ID, span ID, parent span ID) across this graph, from the origin through to the destination and back. This allows me to query the metrics collector (Jaeger or Zipkin) using this trace ID, look at the timestamps originating at the various services and do a T_end - T_start to determine the overall time taken by one call for a round trip across all the related services.
However, this breaks when a subset of these functions cannot propagate the B3 trace IDs for various reasons (e.g., a service is watching a specific state and acts when the state changes). I've been looking into OTEL and other related non-OTEL ways to capture metrics, but it appears there's not much research into this area though it does not seem like a unique or new problem.
Has anyone here looked at this scenario, and have you had any luck with OTEL or other mechanisms to get results?
[1] https://opentelemetry.io/docs/specs/otel/context/api-propaga...
[2] https://github.com/openzipkin/b3-propagation
[3] https://www.w3.org/TR/trace-context/

trace-context-w3c

11 4 0.0 C#

W3C Trace Context purpose of and what kind of problem it came to solve.

I've been playing with OTEL for a while, with a few backends like Jaeger and Zipkin, and am trying to figure out a way to perform end to end timing measurements across a graph of services triggered by any of several events.
Consider this scenario: There is a collection of services that talk to one another, and not all use HTTP. Say agent A0 makes a connection to agent A1, this is observed by service S0 which triggers service S1 to make calls to S2 and S3, which propagate elsewhere and return answers.
If we limit the scope of this problem to services explicitly making HTTP calls to other services, we can easily use the Propagators API [1] and use X-B3 headers [2] to propagate the trace context (trace ID, span ID, parent span ID) across this graph, from the origin through to the destination and back. This allows me to query the metrics collector (Jaeger or Zipkin) using this trace ID, look at the timestamps originating at the various services and do a T_end - T_start to determine the overall time taken by one call for a round trip across all the related services.
However, this breaks when a subset of these functions cannot propagate the B3 trace IDs for various reasons (e.g., a service is watching a specific state and acts when the state changes). I've been looking into OTEL and other related non-OTEL ways to capture metrics, but it appears there's not much research into this area though it does not seem like a unique or new problem.
Has anyone here looked at this scenario, and have you had any luck with OTEL or other mechanisms to get results?
[1] https://opentelemetry.io/docs/specs/otel/context/api-propaga...
[2] https://github.com/openzipkin/b3-propagation
[3] https://www.w3.org/TR/trace-context/

zipkin-api-example

1 9 10.0 Go

Example of how to use the OpenApi/Swagger api spec

Yes, I really agree, and I've gone through the same pain, but try using the alternatives that claim to be better because they have OpenAPI specifications [1]
The example shows you how to use the swagger tool, parse the OpenAPI spec [2], auto-generate GoLang glue code, call __one__ of those auto-generated functions and log a trace.
However, there is zero documentation, zero other examples, and I'm left scratching my head whether there's even one person in the world using this approach. I eventually ended up just directly using the service APIs [3] via REST calls.
OTEL is painful, but the alternatives are no better :( I really wish there's some interest in this space, since SLO's and SLI measurements are becoming increasingly important.
[1] https://github.com/openzipkin/zipkin-api-example
[2] https://github.com/openzipkin/zipkin-api/blob/master/zipkin2...
[3] https://zipkin.io/zipkin-api/#/

zipkin-api

1 59 3.4 Thrift

Zipkin's language independent model and HTTP Api Definitions

Yes, I really agree, and I've gone through the same pain, but try using the alternatives that claim to be better because they have OpenAPI specifications [1]
The example shows you how to use the swagger tool, parse the OpenAPI spec [2], auto-generate GoLang glue code, call __one__ of those auto-generated functions and log a trace.
However, there is zero documentation, zero other examples, and I'm left scratching my head whether there's even one person in the world using this approach. I eventually ended up just directly using the service APIs [3] via REST calls.
OTEL is painful, but the alternatives are no better :( I really wish there's some interest in this space, since SLO's and SLI measurements are becoming increasingly important.
[1] https://github.com/openzipkin/zipkin-api-example
[2] https://github.com/openzipkin/zipkin-api/blob/master/zipkin2...
[3] https://zipkin.io/zipkin-api/#/

docs

6 625 8.4 SCSS

Prometheus documentation: content and static site generator (by prometheus)

Prometheus text exposition format is de-facto standard used in monitoring. It would be great building an official observability standard on top it. This format is much easier to debug and understand than OpenTelemetry for metrics. It is also more efficient, e.g. it requires less network bandwidth and less CPU for transfer than Otel for metrics.
[1] https://github.com/prometheus/docs/blob/main/content/docs/in...

SaaSHub

www.saashub.com sponsored

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Show HN: Autometrics – open-source observability stack
1 project | news.ycombinator.com | 5 Dec 2023
Show HN: Autometrics Explorer – A Contextual UI for Prometheus
1 project | news.ycombinator.com | 17 Aug 2023
autometrics: easily add metrics to any function -- and jump to live Prometheus charts directly from your IDE (links with automatically customized PromQL queries are inserted into each function's doc comments)
3 projects | /r/rust | 2 Feb 2023
Show HN: OneUptime – open-source Datadog Alternative
7 projects | news.ycombinator.com | 2 Apr 2024
All you need is Wide Events, not "Metrics, Logs and Traces"
7 projects | news.ycombinator.com | 27 Feb 2024

OpenTelemetry in 2023

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
Opentelemetry Observability Metrics Prometheus Monitoring
Post date: 28 Aug 2023

tempo

signoz

WorkOS

oteps

opentelemetry-proto

opentelemetry-go

Grafana

terraform-aws-jaeger

InfluxDB

openobserve

opentelemetry-js

opentelemetry-js-contrib

jaeger-tempo

proposal-async-context

VictoriaMetrics

self-hosted

skywalking

aws-otel-lambda

opentelemetry-lambda

opentelemetry-specification

opentelemetry-specificatio

community

semantic-conventions

proposal-explicit-resource-management

proposal-explicit-resource-managemen

veneur

odigos

b3-propagation

trace-context-w3c

zipkin-api-example

zipkin-api

docs

SaaSHub

Related posts

OpenTelemetry in 2023

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com Opentelemetry Observability Metrics Prometheus Monitoring Post date: 28 Aug 2023

Related posts

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
Opentelemetry Observability Metrics Prometheus Monitoring
Post date: 28 Aug 2023