BIG-Bench-Hard Alternatives

Similar projects and alternatives to BIG-Bench-Hard

llm-humaneval-benchmarks

10 83 4.9 Jupyter Notebook BIG-Bench-Hard VS llm-humaneval-benchmarks
code-eval

5 349 8.0 Python BIG-Bench-Hard VS code-eval

Run evaluation on LLMs using human-eval benchmark
InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
visqol

2 617 3.1 C++ BIG-Bench-Hard VS visqol

Perceptual Quality Estimator for speech and audio
llm-humaneval-ben

2 - - BIG-Bench-Hard VS llm-humaneval-ben

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a better BIG-Bench-Hard alternative or higher similarity.

Suggest an alternative to BIG-Bench-Hard

BIG-Bench-Hard reviews and mentions

Posts with mentions or reviews of BIG-Bench-Hard. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-07-27.

LLaMA2 Chat 70B outperformed ChatGPT
5 projects | news.ycombinator.com | 27 Jul 2023

It depends on the eval, but I think it's fair to say that it's close. Here is the AGI Eval results organized into a table w/ averages (also I put in the new Hermes LLama2 13B model as well: https://docs.google.com/spreadsheets/d/1kT4or6b0Fedd-W_jMwYp...
It beats out ChatGPT in every category except SAT-Math. We definitely need harder benchmarks.
So far, there's BIG-Bench Hard https://github.com/suzgunmirac/BIG-Bench-Hard and just published, Advanced Reasoning Benchmark https://arb.duckai.org/
Bard is dreadful at solving rhyming word puzzles
1 project | news.ycombinator.com | 21 Mar 2023

Even the best Google models seem to be lagging for reasoning tasks vs OpenAI ones at the moment - see the graphs at https://github.com/suzgunmirac/BIG-Bench-Hard

Stats

Basic BIG-Bench-Hard repo stats

Mentions

Stars

365

Activity

0.0

Last Commit

about 1 year ago

suzgunmirac/BIG-Bench-Hard is an open source project licensed under MIT License which is an OSI approved license.

Popular Comparisons

BIG-Bench-Hard

BIG-Bench-Hard Alternatives

Similar projects and alternatives to BIG-Bench-Hard

llm-humaneval-benchmarks

code-eval

InfluxDB

visqol

llm-humaneval-ben

BIG-Bench-Hard reviews and mentions

Stats

Popular Comparisons