Ask HN: How are you testing your LLM applications?

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • langsmith-cookbook

  • cofounder of LangChain here - LangSmith is actually completely independent of LangChain. We have some documentation here (https://docs.smith.langchain.com/tracing/tracing-faq#how-do-...) on this, as well as a few more detailed cookbooks (https://github.com/langchain-ai/langsmith-cookbook/tree/main... for tracing, https://github.com/langchain-ai/langsmith-cookbook/tree/main... for testing)

    We are actually revamping our docs as we speak, with a particular empahisis on using WITHOUT LangChain - that is absolutely a direction we are leaning into

  • agenta

    The all-in-one LLM developer platform: prompt management, evaluation, human feedback, and deployment all in one place.

  • I am biased, but I would use a platform and not roll your own solution. You will tend to underestimate the depth of capabilities needed for an eval framework.

    Now for solutions, shameless plug here, we are building an open-source platform for experimenting and evaluating complex LLM apps (https://github.com/agenta-ai/agenta). We offer automatic evaluators as well as human annotation capabilities. Currently, we only provide testing before deployment, but we have plans to include post-production evaluations as well.

    Other tools I would look at in the space are promptfoo (also open-source, more dev oriented), humanloop (one of the most feature complete tools in the space, enterprise oriented), however more enterprise oriented / costly) and vellum (YC company, more focused towards product teams)

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • langfuse VS agenta - a user suggested alternative

    2 projects | 22 Nov 2023
  • langchain VS agenta - a user suggested alternative

    2 projects | 22 Nov 2023
  • πŸ€– Agenta: Open-Source Platform for LLM Prompt Engineering, Evaluation, and Deployment

    1 project | /r/opensource | 2 Sep 2023
  • Show HN: Knit – A Better LLM Playground

    3 projects | news.ycombinator.com | 8 Aug 2023
  • Top Open Source Prompt Engineering Guides & ToolsπŸ”§πŸ—οΈπŸš€

    5 projects | dev.to | 2 May 2024