-
hallucination-leaderboard
Leaderboard Comparing LLM Performance at Producing Hallucinations when Summarizing Short Documents
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
This is just typical of so much work in the field. They pick and choose which models to compare against and on which benchmarks. If this model was truly great, they would be comparing against Claude 2 and GPT4 across a bunch of different benchmarks. Instead they compare against Palm 2, which in a lot of tests is a weak model (https://venturebeat.com/ai/google-bard-fails-to-deliver-on-i....) and prone to hallucination (https://github.com/vectara/hallucination-leaderboard).