BIG-Bench-Hard

Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them (by suzgunmirac)

BIG-Bench-Hard Alternatives

Similar projects and alternatives to BIG-Bench-Hard

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a better BIG-Bench-Hard alternative or higher similarity.

BIG-Bench-Hard reviews and mentions

Posts with mentions or reviews of BIG-Bench-Hard. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-07-27.
  • LLaMA2 Chat 70B outperformed ChatGPT
    5 projects | news.ycombinator.com | 27 Jul 2023
    It depends on the eval, but I think it's fair to say that it's close. Here is the AGI Eval results organized into a table w/ averages (also I put in the new Hermes LLama2 13B model as well: https://docs.google.com/spreadsheets/d/1kT4or6b0Fedd-W_jMwYp...

    It beats out ChatGPT in every category except SAT-Math. We definitely need harder benchmarks.

    So far, there's BIG-Bench Hard https://github.com/suzgunmirac/BIG-Bench-Hard and just published, Advanced Reasoning Benchmark https://arb.duckai.org/

  • Bard is dreadful at solving rhyming word puzzles
    1 project | news.ycombinator.com | 21 Mar 2023
    Even the best Google models seem to be lagging for reasoning tasks vs OpenAI ones at the moment - see the graphs at https://github.com/suzgunmirac/BIG-Bench-Hard

Stats

Basic BIG-Bench-Hard repo stats
2
365
0.0
about 1 year ago

suzgunmirac/BIG-Bench-Hard is an open source project licensed under MIT License which is an OSI approved license.


Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com