GPT4All outscores GPT-3.5 on new hallucination metric

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • haltt4llm

    This project is an attempt to create a common metric to test LLM's for progress in eliminating hallucinations which is the most serious current problem in widespread adoption of LLM's for many real purposes.

  • Hmm, but if I'm reading this code correctly, it's also correct if the text of the correct answer appears anywhere in the output. Even if other incorrect answers also appear.

    https://github.com/manyoso/haltt4llm/blob/main/take_test.py#...

    So the above answer would have been correct were it not for the fact that it said "doubled its" rather than "double it".

    Without seeing the log of answers marked correct, I'm skeptical that GPT4All, which seems to produce rambling prose for all of its incorrect answers, is actually picking one of the multiple choice options the rest of the time. It seems like a model could get 100% 'correct' just by repeating back all five options.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • Snip tool

    1 project | dev.to | 10 May 2024
  • Computer Vision Meetup: Anomaly Detection with Anomalib and FiftyOne

    1 project | dev.to | 10 May 2024
  • Apple to Power AI Features with M2 Ultra Servers

    2 projects | news.ycombinator.com | 10 May 2024
  • Show HN: An SQS Alternative on Postgres

    10 projects | news.ycombinator.com | 9 May 2024
  • Show HN: OpenVoice_server, a simple API server built on top of OpenVoice

    1 project | news.ycombinator.com | 10 May 2024