Sharing learnings from evaluating Million+ LLM responses

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • uptrain

    UpTrain is an open-source unified platform to evaluate and improve Generative AI applications. We provide grades for 20+ preconfigured checks (covering language, code, embedding use-cases), perform root cause analysis on failure cases and give insights on how to resolve them.

  • b. Task Dependent: Tonality match with the given persona, creativity, interestingness, etc. Your prompt can play a big role here

    3. Evaluating Reasoning Capabilities: Includes dimensions like logical correctness (right conclusions), logical robustness (consistent with minor input changes), logical efficiency (shortest solution path), and common sense understanding (grasping common concepts). One canโ€™t do much beyond prompting techniques like CoT and primarily depends upon the LLM chosen.

    4. Custom Evaluations: Many applications require customized metrics tailored to their specific needs. You want adherence to custom guidelines, check for certain keywords, etc.

    You can read the full blog here (https://uptrain.ai/blog/how-to-evaluate-your-llm-applications). Hope you find it useful. I am one of the developer of UpTrain - it is an open-source package to evaluate LLM applications (https://github.com/uptrain-ai/uptrain).

    Would love to get feedback from the HN community.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • Integrating Spade: Synthesizing Assertions for LLMs into My OSS Project

    2 projects | news.ycombinator.com | 23 Jan 2024
  • Show HN: UpTrain (YC W23) โ€“ open-source tool to evaluate LLM response quality

    1 project | news.ycombinator.com | 22 Aug 2023
  • Introducing UpTrain - Open-source LLM evaluator ๐Ÿ”Ž

    1 project | /r/LanguageTechnology | 13 Jul 2023
  • Launching UpTrain - an open-source LLM testing tool to help check the performance of your LLM applications

    1 project | /r/datascience | 5 Jul 2023
  • [P] A Practical Guide to Enhancing Models for Custom Use-cases

    1 project | /r/MachineLearning | 5 Apr 2023