instruct-eval VS lm-evaluation-harness

Compare instruct-eval vs lm-evaluation-harness and see what are their differences.

instruct-eval

This repository contains code to quantitatively evaluate instruction-tuned models such as Alpaca and Flan-T5 on held-out tasks. (by declare-lab)

lm-evaluation-harness

A framework for few-shot evaluation of autoregressive language models. (by bigscience-workshop)
InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
instruct-eval lm-evaluation-harness
6 1
466 91
3.0% -
8.0 3.7
2 months ago about 1 year ago
Python Python
Apache License 2.0 MIT License
The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

instruct-eval

Posts with mentions or reviews of instruct-eval. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-04-23.

lm-evaluation-harness

Posts with mentions or reviews of lm-evaluation-harness. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-04-19.
  • Stability AI Launches the First of Its StableLM Suite of Language Models
    24 projects | news.ycombinator.com | 19 Apr 2023
    Yeah, although looks like it currently has some issues with coqa: https://github.com/EleutherAI/lm-evaluation-harness/issues/2...

    There's also the bigscience fork, but I ran into even more problems (although I didn't try too hard) https://github.com/bigscience-workshop/lm-evaluation-harness

    And there's https://github.com/EleutherAI/lm-eval2/ (not sure if it's just starting over w/ a new repo or what?) but it has limited tests available

What are some alternatives?

When comparing instruct-eval and lm-evaluation-harness you can also consider the following projects:

StableLM - StableLM: Stability AI Language Models

awesome-totally-open-chatgpt - A list of totally open alternatives to ChatGPT

flash-attention - Fast and memory-efficient exact attention

geov - The GeoV model is a large langauge model designed by Georges Harik and uses Rotary Positional Embeddings with Relative distances (RoPER). We have shared a pre-trained 9B parameter model.

txtinstruct - 📚 Datasets and models for instruction-tuning

Emu - Emu Series: Generative Multimodal Models from BAAI

AlpacaDataCleaned - Alpaca dataset from Stanford, cleaned and curated

lm-eval2

lm-evaluation-harness - A framework for few-shot evaluation of language models.