GPT4All outscores GPT-3.5 on new hallucination metric

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

haltt4llm

6 215 5.4 Python

This project is an attempt to create a common metric to test LLM's for progress in eliminating hallucinations which is the most serious current problem in widespread adoption of LLM's for many real purposes.

Hmm, but if I'm reading this code correctly, it's also correct if the text of the correct answer appears anywhere in the output. Even if other incorrect answers also appear.
https://github.com/manyoso/haltt4llm/blob/main/take_test.py#...
So the above answer would have been correct were it not for the fact that it said "doubled its" rather than "double it".
Without seeing the log of answers marked correct, I'm skeptical that GPT4All, which seems to produce rambling prose for all of its incorrect answers, is actually picking one of the multiple choice options the rest of the time. It seems like a model could get 100% 'correct' just by repeating back all five options.

InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Snip tool

1 project | dev.to | 10 May 2024
Computer Vision Meetup: Anomaly Detection with Anomalib and FiftyOne

1 project | dev.to | 10 May 2024
Apple to Power AI Features with M2 Ultra Servers

2 projects | news.ycombinator.com | 10 May 2024
Show HN: An SQS Alternative on Postgres

10 projects | news.ycombinator.com | 9 May 2024
Show HN: OpenVoice_server, a simple API server built on top of OpenVoice

1 project | news.ycombinator.com | 10 May 2024

GPT4All outscores GPT-3.5 on new hallucination metric

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com Post date: 30 Mar 2023

haltt4llm

InfluxDB

Related posts

Snip tool

Computer Vision Meetup: Anomaly Detection with Anomalib and FiftyOne

Apple to Power AI Features with M2 Ultra Servers

Show HN: An SQS Alternative on Postgres

Show HN: OpenVoice_server, a simple API server built on top of OpenVoice