Sharing learnings from evaluating Million+ LLM responses

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

uptrain

34 1,999 9.6 Python

UpTrain is an open-source unified platform to evaluate and improve Generative AI applications. We provide grades for 20+ preconfigured checks (covering language, code, embedding use-cases), perform root cause analysis on failure cases and give insights on how to resolve them.

b. Task Dependent: Tonality match with the given persona, creativity, interestingness, etc. Your prompt can play a big role here
3. Evaluating Reasoning Capabilities: Includes dimensions like logical correctness (right conclusions), logical robustness (consistent with minor input changes), logical efficiency (shortest solution path), and common sense understanding (grasping common concepts). One can’t do much beyond prompting techniques like CoT and primarily depends upon the LLM chosen.
4. Custom Evaluations: Many applications require customized metrics tailored to their specific needs. You want adherence to custom guidelines, check for certain keywords, etc.
You can read the full blog here (https://uptrain.ai/blog/how-to-evaluate-your-llm-applications). Hope you find it useful. I am one of the developer of UpTrain - it is an open-source package to evaluate LLM applications (https://github.com/uptrain-ai/uptrain).
Would love to get feedback from the HN community.

InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Integrating Spade: Synthesizing Assertions for LLMs into My OSS Project

2 projects | news.ycombinator.com | 23 Jan 2024
Show HN: UpTrain (YC W23) – open-source tool to evaluate LLM response quality

1 project | news.ycombinator.com | 22 Aug 2023
Introducing UpTrain - Open-source LLM evaluator 🔎

1 project | /r/LanguageTechnology | 13 Jul 2023
Launching UpTrain - an open-source LLM testing tool to help check the performance of your LLM applications

1 project | /r/datascience | 5 Jul 2023
[P] A Practical Guide to Enhancing Models for Custom Use-cases

1 project | /r/MachineLearning | 5 Apr 2023

Sharing learnings from evaluating Million+ LLM responses

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
concept-drift data-drift edge-cases Machine Learning ml-observability
Post date: 1 Nov 2023

uptrain

InfluxDB

Related posts

Integrating Spade: Synthesizing Assertions for LLMs into My OSS Project

Show HN: UpTrain (YC W23) – open-source tool to evaluate LLM response quality

Introducing UpTrain - Open-source LLM evaluator 🔎

Launching UpTrain - an open-source LLM testing tool to help check the performance of your LLM applications

[P] A Practical Guide to Enhancing Models for Custom Use-cases

Sharing learnings from evaluating Million+ LLM responses

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com concept-drift data-drift edge-cases Machine Learning ml-observability Post date: 1 Nov 2023

uptrain

InfluxDB

Related posts

Integrating Spade: Synthesizing Assertions for LLMs into My OSS Project

Show HN: UpTrain (YC W23) – open-source tool to evaluate LLM response quality

Introducing UpTrain - Open-source LLM evaluator 🔎

Launching UpTrain - an open-source LLM testing tool to help check the performance of your LLM applications

[P] A Practical Guide to Enhancing Models for Custom Use-cases

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
concept-drift data-drift edge-cases Machine Learning ml-observability
Post date: 1 Nov 2023