Show HN: Langfuse – Open-source observability and analytics for LLM apps

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

SurveyJS - Open-Source JSON Form Builder to Create Dynamic Forms Right in Your App
With SurveyJS form UI libraries, you can build and style forms in a fully-integrated drag & drop form builder, render them in your JS app, and store form submission data in any backend, inc. PHP, ASP.NET Core, and Node.js.
surveyjs.io
featured
InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
  • langfuse

    🪢 Open source LLM engineering platform: Observability, metrics, evals, prompt management, playground, datasets. Integrates with LlamaIndex, Langchain, OpenAI SDK, LiteLLM, and more. 🍊YC W23

  • fern

    🌿 Stripe-level SDKs and Docs for your API

  • Hi HN! Langfuse is OSS observability and analytics for LLM applications (repo: https://github.com/langfuse/langfuse, 2 min demo: https://langfuse.com/video; try it yourself: https://langfuse.com/demo)

    Langfuse makes capturing and viewing LLM calls (execution traces) a breeze. On top of this data, you can analyze the quality, cost and latency of LLM apps.

    When GPT-4 dropped, we started building LLM apps – a lot of them! [1, 2] But they all suffered from the same issue: it’s hard to assure quality in 100% of cases and even to have a clear view of user behavior. Initially, we logged all prompts/completions to our production database to understand what works and what doesn’t. We soon realized we needed more context, more data and better analytics to sustainably improve our apps. So we started building a homegrown tool.

    Our first task was to track and view what is going on in production: what user input is provided, how prompt templates or vector db requests work, and which steps of an LLM chain fail. We built async SDKs and a slick frontend to render chains in a nested way. It’s a good way to look at LLM logic ‘natively’. Then we added some basic analytics to understand token usage and quality over time for the entire project or single users (pre-built dashboards).

    Under the hood, we use the T3 stack (Typescript, NextJs, Prisma, tRPC, Tailwind, NextAuth), which allows us to move fast + it means it's easy to contribute to our repo. The SDKs are heavily influenced by the design of the PostHog SDKs [3] for stable implementations of async network requests. It was a surprisingly inconvenient experience to convert OpenAPI specs to boilerplate Python code and we ended up using Fern [4] here. We’re fans of Tailwind + shadcn/ui + tremor.so for speed and flexibility in building tables and dashboards fast.

    Our SDKs run fully asynchronously and make network requests in the background. We did our best to reduce any impact on application performance to a minimum. We never block the main execution path.

    We've made two engineering decisions we've felt uncertain about: to use a Postgres database and Looker Studio for the analytics MVP. Supabase performs well at our scale and integrates seamlessly into our tech stack. We will need to move to an OLAP database soon and are debating if we need to start batching ingestion and if we can keep using Vercel. Any experience you could share would be helpful!

    Integrating Looker Studio got us to first analytics charts in half a day. As it is not open-source and does not work with our UI/UX, we are looking to switch it out for an OSS solution to flexibly generate charts and dashboards. We’ve had a look at Lightdash and would be happy to hear your thoughts.

    We’re borrowing our OSS business model from Posthog/Supabase who make it easy to self-host with features reserved for enterprise (no plans yet) and a paid version for managed cloud service. Right now all of our code is available under a permissive license (MIT).

    Next, we’re going deep on analytics. For quality specifically, we will build out model-based evaluations and labeling to be able to cluster traces by scores and use cases.

    Looking forward to hearing your thoughts and discussion – we’ll be in the comments. Thanks!

    [1] https://learn-from-ai.com/

    [2] https://www.loom.com/share/5c044ca77be44ff7821967834dd70cba

    [3] https://posthog.com/docs/libraries

    [4] https://buildwithfern.com/

  • SurveyJS

    Open-Source JSON Form Builder to Create Dynamic Forms Right in Your App. With SurveyJS form UI libraries, you can build and style forms in a fully-integrated drag & drop form builder, render them in your JS app, and store form submission data in any backend, inc. PHP, ASP.NET Core, and Node.js.

    SurveyJS logo
  • clickhouse_knowledge_base

    The Tinybird ClickHouse Knowledge Base

  • However, for anyone reading this, they use Clickhouse under the hood and have created a knowledge base (https://github.com/tinybirdco/clickhouse_knowledge_base). I will browse it to learn more.

  • opentelemetry-instrument-openai-py

    OpenTelemetry instrumentation for the OpenAI Python library

  • Makes sense! If you're curious, I added an autoinstrumentation library for openai's python client here: https://github.com/cartermp/opentelemetry-instrument-openai-...

    The main challenge I see is that since there's no standard that each LLM has for inputs/outputs (let alone retrieval APIs!) any kind of automatic instrumentation will need to have a bunch of adapters. I suppose LangChain helps here, but even then with so many folks ripping it out for production you're still in the same place.

    Happy to collaborate on any design thinking for how to incorporate OTel support!

  • Makes sense! If you're curious, I added an autoinstrumentation library for openai's python client here: https://github.com/cartermp/opentelemetry-instrument-openai-...

    The main challenge I see is that since there's no standard that each LLM has for inputs/outputs (let alone retrieval APIs!) any kind of automatic instrumentation will need to have a bunch of adapters. I suppose LangChain helps here, but even then with so many folks ripping it out for production you're still in the same place.

    Happy to collaborate on any design thinking for how to incorporate OTel support!

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • Fern: Toolkit to generate SDKs and Docs for your API

    1 project | news.ycombinator.com | 3 Apr 2024
  • Fern: Beautiful SDKs and Docs for Your API

    1 project | news.ycombinator.com | 30 Oct 2023
  • Can't bring myself to produce code any more, any ideas to help?

    1 project | /r/ExperiencedDevs | 31 May 2021
  • What are the most promising novel distributed ledger consensus and/or sharding mechanisms?

    1 project | /r/CryptoTechnology | 15 Apr 2021
  • Build some good documentation habits in your team with ADRs (Architecture Decision Records) and their go-to tool: Log4brains

    1 project | /r/ExperiencedDevs | 15 Jan 2021