Dataset of MMLU results broken down by task

This page summarizes the projects mentioned and recommended in the original post on /r/datasets

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • lm-evaluation-harness

    A framework for few-shot evaluation of language models.

  • I am primarily looking for results of running the MMLU evaluation on modern large language models. I have been able to find some data here https://github.com/EleutherAI/lm-evaluation-harness/tree/master/results and will be asking them if/when, they can provide any additional data.

  • helm

    Holistic Evaluation of Language Models (HELM), a framework to increase the transparency of language models (https://arxiv.org/abs/2211.09110). This framework is also used to evaluate text-to-image models in Holistic Evaluation of Text-to-Image Models (HEIM) (https://arxiv.org/abs/2311.04287). (by stanford-crfm)

  • Looking at their github repo, it also seems like the MMLU result is from just those 5 tasks and not all of them https://github.com/stanford-crfm/helm/issues/1335

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • Kolmogorov-Arnold Networks

    5 projects | news.ycombinator.com | 30 Apr 2024
  • This Week In Python

    5 projects | dev.to | 3 May 2024
  • TTCP CAGE Challenge 4: autonomous cyber defensive agents

    1 project | news.ycombinator.com | 3 May 2024
  • Show HN: A Python Swiss-knife to manage Wayland compositors (Hyprland, Sway)

    1 project | news.ycombinator.com | 3 May 2024
  • Azure SDK is over 500 MB and growing on each release

    1 project | news.ycombinator.com | 3 May 2024