Launch HN: Odigos (YC W23) – Instant distributed tracing for Kubernetes clusters

Our great sponsors

WorkOS - The modern identity platform for B2B SaaS

InfluxDB - Power Real-Time Data Analytics at Scale

SaaSHub - Software Alternatives and Reviews

Our great sponsors

odigos

40 3,020 9.8 Go

Distributed tracing without code changes. 🚀 Instantly monitor any application using OpenTelemetry and eBPF

Hi HN! We’re Eden and Ari, co-founders of Odigos (https://github.com/keyval-dev/odigos). Odigos is an open-source project that lets you instantly generate distributed traces for your applications. It works alongside existing monitoring tools and does not require any code changes.
Our earlier experiences with monitoring tools were frustrating. Monitoring a distributed system with multiple microservices, we found ourselves spending way too much time trying to locate the specific microservice that was at the root of a problem. For example, we once spent hours debugging an application which we suspected was causing high latency, only to find out that the actual problem was rooted in a completely different application
Then we learned about distributed tracing, which solves exactly this problem. As opposed to metrics or logs that capture a data point in time in a single application, a distributed trace follows a request as it propagates through a distributed environment by tagging it with a unique ID. This allows developers to understand the context of each request and how their distributed applications work.
The downside is that it is difficult to implement. Unlike metrics or logs, the value of distributed tracing is gained only after implementing it across multiple applications. If even one of your applications does not produce distributed tracing, the context propagation is broken and the value of the traces drops significantly.
We manually implemented distributed tracing for multiple companies, but found it a challenge to coordinate all the development teams to instrument their applications in order to achieve a complete distributed trace. Once the implementation was finished, we saw great value and fixed production issues much faster. But partial implementation wasn’t worth much.
We set out to find a way to automate this process. We knew how to do most of it, but the trickiest part was how to automatically instrument programs written in compiled languages (like Go). If we could do that, we would be able to automate the entire process of generating distributed traces. While researching, we realized that eBPF—a technology that allows the Linux kernel to load external programs for execution within the kernel—could be used to develop automatic instrumentation for compiled languages. That was the final piece of the puzzle, and with it we were able to develop Odigos.
Odigos first scans and recognizes all your running applications, then recognizes the programming language of each one and auto-instruments it accordingly, using eBPF and OpenTelemetry. In addition, it deploys collectors that buffer, filter, and deliver data to your chosen monitoring tool, and auto scales them according to the amount of traffic. This automation allows developers to enjoy distributed traces within minutes as opposed to manual effort which can take months to implement.
Automatic instrumentation across programming languages is not a trivial task, especially when dealing with static binaries (like the ones produced by the Go compiler). We built multiple mechanisms to make sure we inject the relevant headers in a secure and stable way. We developed a system that tracks functions and structs across different versions of open-source libraries. In addition, we developed a system that performs userspace memory management in eBPF. As a result, Odigos is the only solution that is able to automatically generate distributed traces for compiled languages like Go and Rust. While other solutions require users to be experts in OpenTelemetry or eBPF, our solution does not require prior knowledge of observability technologies.
Our solution can be installed on any Kubernetes cluster by executing a single command. Once installed, we detect the programming language of every running application and apply the relevant instrumentation. For JIT languages (Java and .NET) or interpreted languages (JavaScript and Python) we deploy the OpenTelemetry instrumentation. For compiled languges (Go, Rust, C) we deploy our eBPF-based instrumentation. All of this is abstracted from the user, who only has to: (1) select any or all of the target applications and (2) select a backend to send the monitoring data to.
In May 2022, we released our first open-source project: automatic instrumentation for Go applications, based on eBPF. We later donated this project to the OpenTelemetry community and it is currently being developed as part of the Go Automatic Instrumentation SIG.
We are big believers in open standards, therefore the instrumentation and collectors used by Odigos are all based on open-source projects developed by the OpenTelemetry community. This also enables us to be vendor-agnostic.
Currently we are focused on building our open-source project. There are no pricing or paid features as of yet, but in the future, we are planning to offer a managed version of Odigos that will include enterprise features.
If you're interested to learn more, check out our docs (https://docs.odigos.io), watch a demo video (https://www.youtube.com/watch?v=9d36AmVtuGU), and visit our website (https://odigos.io).
We’d love to hear your experiences with tracing and monitoring distributed applications and anything else you’d like to share!

opentelemetry-go-instrumentation

9 282 1.7 C

OpenTelemetry auto-instrumentation for Go applications (by keyval-dev)

The BPF instrumentation is quite cool! I wonder if uprobes have a performance impact. Does it roughly compare to a single syscall?
https://github.com/keyval-dev/opentelemetry-go-instrumentati...

WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
opentelemetry-go-instrumentati

2 - -

The BPF instrumentation is quite cool! I wonder if uprobes have a performance impact. Does it roughly compare to a single syscall?
https://github.com/keyval-dev/opentelemetry-go-instrumentati...

containers-roadmap

80 5,142 2.0 Shell

This is the public roadmap for AWS container services (ECS, ECR, Fargate, and EKS).

Naturally some integrations are out of your hand, AWS Fargate being one (https://github.com/aws/containers-roadmap/issues/1027). However, if you could get integrations up and running with the likes of Fargate, Fly.io, Render.com etc. That'd be amazing.

hubble-otel

1 68 3.2 Go

Hubble adaptor for OpenTelemetry

Looks cool! Great to see entrants into this space.
How does this compare with Cilium? Looks like they do OT tracing (https://github.com/cilium/hubble-otel) but it's not native/core, is that the main distinction?

opentelemetry-java-instrumentation

12 1,717 9.9 Java

OpenTelemetry auto-instrumentation and instrumentation libraries for Java

We are actually able to handle the long tail of tracing by leveraging the amazing open source community. For languages like Java we use the automatic instrumentation created by the OpenTelemetry community which is really great and support ton of libraries, you can see a list of supported libraries here: https://github.com/open-telemetry/opentelemetry-java-instrum...

opentelemetry-java-instrum

1 - -

We are actually able to handle the long tail of tracing by leveraging the amazing open source community. For languages like Java we use the automatic instrumentation created by the OpenTelemetry community which is really great and support ton of libraries, you can see a list of supported libraries here: https://github.com/open-telemetry/opentelemetry-java-instrum...

InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
pixie

19 5,273 9.4 C++

Instant Kubernetes-Native Application Observability

Congratulations on the launch, and thank you for choosing an awesome license!
For an unrelated reason, today I was reminded about Pixie (https://news.ycombinator.com/item?id=25375170 and https://news.ycombinator.com/item?id=31687978 and https://github.com/pixie-io/pixie#readme ), which says is also an ebpf kubernetes observability tool, also Apache licensed.
I suspect the difference may be your aspirations to move out of just kubernetes, but I wondered if that's the biggest difference between your project and theirs? Or maybe the C++ versus golang?

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project