[R] LMFlow Benchmark: An Automatic Evaluation Framework for Open-Source LLMs

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

lm-evaluation-harness

34 5,211 9.9 Python

A framework for few-shot evaluation of language models.

Here, we make use of Eleuther AI’s LM evaluation harness repository (https://github.com/EleutherAI/lm-evaluation-harness) to get QA accuracy results. We also evaluate all models’ NLL metrics on their datasets, with their questions as contexts and answers as output sentences.

LMFlow

10 8,042 9.6 Python

An Extensible Toolkit for Finetuning and Inference of Large Foundation Models. Large Models for All.

LMFlow: https://github.com/OptimalScale/LMFlow

InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
giskard

7 3,164 10.0 Python

🐢 Open-Source Evaluation & Testing framework for LLMs and ML models

This is super interesting! Thanks for sharing. We're also working on this research field from an open-source angle (https://github.com/Giskard-AI/giskard)

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Anomaly Detection with FiftyOne and Anomalib

4 projects | dev.to | 6 May 2024
May 8, 2024 AI, Machine Learning and Computer Vision Meetup

2 projects | dev.to | 1 May 2024
Voxel51 Is Hiring AI Researchers and Scientists — What the New Open Science Positions Mean

1 project | dev.to | 26 Apr 2024
Machine Learning and AI Beyond the Basics Book

1 project | news.ycombinator.com | 16 Apr 2024
Show HN: Evaluate LLM-based RAG Applications with automated test set generation

1 project | news.ycombinator.com | 11 Apr 2024

[R] LMFlow Benchmark: An Automatic Evaluation Framework for Open-Source LLMs

This page summarizes the projects mentioned and recommended in the original post on /r/MachineLearning
Machine Learning chatgpt Artificial intelligence Deep Learning Mlops
Post date: 9 May 2023

lm-evaluation-harness

LMFlow

InfluxDB

giskard

Related posts

Anomaly Detection with FiftyOne and Anomalib

May 8, 2024 AI, Machine Learning and Computer Vision Meetup

Voxel51 Is Hiring AI Researchers and Scientists — What the New Open Science Positions Mean

Machine Learning and AI Beyond the Basics Book

Show HN: Evaluate LLM-based RAG Applications with automated test set generation

[R] LMFlow Benchmark: An Automatic Evaluation Framework for Open-Source LLMs

This page summarizes the projects mentioned and recommended in the original post on /r/MachineLearning Machine Learning chatgpt Artificial intelligence Deep Learning Mlops Post date: 9 May 2023

lm-evaluation-harness

LMFlow

InfluxDB

giskard

Related posts

Anomaly Detection with FiftyOne and Anomalib

May 8, 2024 AI, Machine Learning and Computer Vision Meetup

Voxel51 Is Hiring AI Researchers and Scientists — What the New Open Science Positions Mean

Machine Learning and AI Beyond the Basics Book

Show HN: Evaluate LLM-based RAG Applications with automated test set generation

This page summarizes the projects mentioned and recommended in the original post on /r/MachineLearning
Machine Learning chatgpt Artificial intelligence Deep Learning Mlops
Post date: 9 May 2023