SaaSHub helps you find the best software and product alternatives Learn more →
Top 23 Observability Open-Source Projects
-
> Like… has anyone done a Jepsen-like stress test on rsyslogd and shared the results? I’ve half-assedly looked before and not been able to find anything.
I've not used rsyslogd specifically, but I don't see how you'd have any issues with the log volume you described.
[1] https://github.com/netdata/netdata/tree/master/src/crates/ne...
[2] https://learn.netdata.cloud/docs/logs/systemd-journal-logs/s...
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
langfuse
🪢 Open source LLM engineering platform: LLM Observability, metrics, evals, prompt management, playground, datasets. Integrates with OpenTelemetry, Langchain, OpenAI SDK, LiteLLM, and more. 🍊YC W23
Project mention: Three Budget-Guardrail Failure Modes That Matter More Than Model Quality (May 2026) | dev.to | 2026-05-19Source: https://github.com/langfuse/langfuse/issues/12614 (open, updated 2026-05-14)
-
signoz
SigNoz is an open-source observability platform native to OpenTelemetry with logs, traces and metrics in a single application. An open-source alternative to DataDog, NewRelic, etc. 🔥 🖥. 👉 Open source Application Performance Monitoring (APM) & Observability tool
-
MLflow
The open source AI engineering platform for agents, LLMs, and ML models. MLflow enables teams of all sizes to debug, evaluate, monitor, and optimize production-quality AI applications while controlling costs and managing access to models and data.
For example, this can be done using MLflow in Python:
-
Project mention: Apache SkyWalking: APM system designed for microservices, cloud native | news.ycombinator.com | 2025-08-20
-
Project mention: War Story: Debugging a Kafka 4.0 Consumer Lag Spike During a Product Launch Using Cilium 1.17 and Datadog 2026 | dev.to | 2026-04-28
This adds less than 2% overhead to your node’s CPU usage but exposes 14 Kafka-specific eBPF metrics that are critical for debugging lag. We’ve found that 72% of Kafka 4.0 lag incidents we’ve responded to in 2026 stem from node-level network policy issues that only eBPF can detect. If you’re using a different CNI, you can still use Cilium’s standalone eBPF probe https://github.com/cilium/cilium/tree/v1.17.2/contrib/kafka-probe to get these metrics without replacing your entire CNI. Always validate that kafka.heartbeat_drops_total is 0 in staging before every launch.
-
Project mention: Scaling Shopify Webhooks to Handle Millions of Events: A Practical Guide | dev.to | 2026-06-04
Distributed Tracing with Jaeger
-
Project mention: volnux VS Prefect - a user suggested alternative | libhunt.com/r/volnux | 2025-11-19
-
Project mention: We Cut Log Costs by 35% Using Vector 0.30 and Loki 3.0: Lessons from a 3-Month Tuning | dev.to | 2026-05-04
We evaluated three alternatives: ClickHouse for log storage, Fluent Bit for log collection, and the Vector (https://github.com/vectordotdev/vector) + Loki (https://github.com/grafana/loki) stack. ClickHouse had great query performance but required manual index management, which would add operational overhead. Fluent Bit was lightweight but lacked the transform capabilities we needed to mask PII and drop low-value logs. Vector and Loki stood out: Vector is a Rust-based agent with 1/10th the memory footprint of Filebeat, and Loki is designed for cost-efficient log storage with a query model that aligns with how our team actually debugs (using labels, not full-text search).
-
-
Self-Hosting-Guide
Self-Hosting Guide. Learn all about locally hosting (on premises & private web servers) and managing software applications by yourself or your organization. Including Cloud, LLMs, WireGuard, Automation, Home Assistant, and Networking.
"Self-Hosting Guide" - GitHub repository by mikeroyal documenting self-hosted alternatives to cloud services. Available at: https://github.com/mikeroyal/Self-Hosting-Guide
-
openobserve
Open source observability platform for logs, metrics, traces, frontend monitoring, pipelines and LLM observability. A sophisticated, simple and highly performant alternative to Datadog, Splunk, and Elasticsearch with 140x lower storage costs and single binary deployment.
Project mention: Traceway: MIT-licensed observability stack you can self-host in ~90s | news.ycombinator.com | 2026-05-11You should take a look at https://github.com/openobserve/openobserve - Extremely performant and simple full-stack observability solution?
-
Distributed tracing as a discipline is older than most engineers writing about it think. Google's 2010 Dapper paper, by Sigelman and colleagues, is the canonical reference; Twitter open-sourced Zipkin in 2012 as a Dapper-inspired implementation, and Uber open-sourced Jaeger in 2017 on similar lineage. For most of the 2010s, however, the operational reality was vendor-specific: each APM (Datadog, New Relic, AppDynamics, Dynatrace) shipped its own SDK, and instrumenting an application meant choosing a vendor and accepting that the instrumentation work was, structurally, lock-in.
-
Project mention: VictoriaMetrics VS arc - a user suggested alternative | libhunt.com/r/VictoriaMetrics | 2026-04-26
-
kubesphere
The container platform tailored for Kubernetes multi-cloud, datacenter, and edge management ⎈ 🖥 ☁️
Project mention: kubesphere VS kite - a user suggested alternative | libhunt.com/r/kubesphere | 2025-07-31 -
Everything he lists is solved by effect-ts [1] bar, obviously, the language support.
[1] https://effect.website/
-
thanos
Highly available Prometheus setup with long term storage capabilities. A CNCF Incubating project.
Thank you thanos-io: https://github.com/thanos-io/thanos/issues/8381#issuecomment...
-
-
kubeshark
eBPF-powered network observability for Kubernetes. Indexes L4/L7 traffic with full K8s context, decrypts TLS without keys. Queryable by AI agents via MCP and humans via dashboard.
-
-
howtheysre
A curated collection of publicly available resources on how technology and tech-savvy organizations around the world practice Site Reliability Engineering (SRE)
-
hyperdx
Resolve production issues, fast. An open source observability platform unifying session replays, logs, metrics, traces and errors powered by ClickHouse and OpenTelemetry.
-
Project mention: VoltAgent Just Asked Us to Build Their Guardrail Provider Interface. Here Is What We Shipped. | dev.to | 2026-03-20
VoltAgent is a 6.8k-star TypeScript framework for building AI agents. They already had InputGuardrail and OutputGuardrail types — handler functions that run before and after model calls.
Observability discussion
Observability related posts
-
AgentSight: System-wide AI agent tracing and monitoring with eBPF
-
Show HN: RePlaya – self-hosted browser session replay with live tailing
-
Jaeger Tracing Explained: How Distributed Tracing Works
-
riskkernel VS opentrace - a user suggested alternative
2 projects | 1 Jun 2026 -
opentrace VS riskkernel - a user suggested alternative
2 projects | 1 Jun 2026 -
Why Your Logs Are Useless Without Traces
-
What is an AI SRE? Definition, Capabilities, and 2026 Buyer's Lens
-
A note from our sponsor - SaaSHub
www.saashub.com | 7 Jun 2026
Index
What are some of the best open-source Observability projects? This list will help you:
| # | Project | Stars |
|---|---|---|
| 1 | Netdata | 79,077 |
| 2 | langfuse | 28,520 |
| 3 | signoz | 27,194 |
| 4 | MLflow | 26,292 |
| 5 | skywalking | 24,821 |
| 6 | cilium | 24,441 |
| 7 | jaeger | 22,856 |
| 8 | Prefect | 22,528 |
| 9 | vector | 21,985 |
| 10 | kibana | 21,127 |
| 11 | Self-Hosting-Guide | 20,182 |
| 12 | openobserve | 19,128 |
| 13 | zipkin | 17,437 |
| 14 | VictoriaMetrics | 17,104 |
| 15 | kubesphere | 16,953 |
| 16 | effect | 14,520 |
| 17 | thanos | 14,092 |
| 18 | nightingale | 13,054 |
| 19 | kubeshark | 11,922 |
| 20 | pyroscope | 11,477 |
| 21 | howtheysre | 9,725 |
| 22 | hyperdx | 9,575 |
| 23 | voltagent | 9,367 |