AI leaderboards are no longer useful. It's time to switch to Pareto curves

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

vllm

31 19,672 9.9 Python

A high-throughput and memory-efficient inference and serving engine for LLMs

I guess the root cause of my claim is that OpenAI won't tell us whether or not GPT-3.5 is an MoE model, and I assumed it wasn't. Since GPT-3.5 is clearly nondeterministic at temp=0, I believed the nondeterminism was due to FPU stuff, and this effect was amplified with GPT-4's MoE. But if GPT-3.5 is also MoE then that's just wrong.
What makes this especially tricky is that small models are truly 100% deterministic at temp=0 because the relative likelihoods are too coarse for FPU issues to be a factor. I had thought 3.5 was big enough that some of its token probabilities were too fine-grained for the FPU. But that's probably wrong.
On the other hand, it's not just GPT, there are currently floating-point difficulties in vllm which significantly affect the determinism of any model run on it: https://github.com/vllm-project/vllm/issues/966 Note that a suggested fix is upcasting to float32. So it's possible that GPT-3.5 is using an especially low-precision float and introducing nondeterminism by saving money on compute costs.
Sadly I do not have the money[1] to actually run a test to falsify any of this. It seems like this would be a good little research project.
[1] Or the time, or the motivation :) But this stuff is expensive.

InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

VLLM Sacrifices Accuracy for Speed

1 project | news.ycombinator.com | 23 Jan 2024
Easy, fast, and cheap LLM serving for everyone

1 project | news.ycombinator.com | 17 Dec 2023
vllm

1 project | news.ycombinator.com | 15 Dec 2023
Mixtral Expert Parallelism

1 project | news.ycombinator.com | 15 Dec 2023
Mixtral of Experts

4 projects | news.ycombinator.com | 11 Dec 2023

AI leaderboards are no longer useful. It's time to switch to Pareto curves

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
Gpt llm Pytorch llmops Mlops
Post date: 30 Apr 2024

vllm

InfluxDB

Related posts

VLLM Sacrifices Accuracy for Speed

Easy, fast, and cheap LLM serving for everyone

vllm

Mixtral Expert Parallelism

Mixtral of Experts

AI leaderboards are no longer useful. It's time to switch to Pareto curves

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com Gpt llm Pytorch llmops Mlops Post date: 30 Apr 2024

vllm

InfluxDB

Related posts

VLLM Sacrifices Accuracy for Speed

Easy, fast, and cheap LLM serving for everyone

vllm

Mixtral Expert Parallelism

Mixtral of Experts

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
Gpt llm Pytorch llmops Mlops
Post date: 30 Apr 2024