[R] LMFlow Benchmark: An Automatic Evaluation Framework for Open-Source LLMs

This page summarizes the projects mentioned and recommended in the original post on /r/MachineLearning

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • lm-evaluation-harness

    A framework for few-shot evaluation of language models.

  • Here, we make use of Eleuther AI’s LM evaluation harness repository (https://github.com/EleutherAI/lm-evaluation-harness) to get QA accuracy results. We also evaluate all models’ NLL metrics on their datasets, with their questions as contexts and answers as output sentences.

  • LMFlow

    An Extensible Toolkit for Finetuning and Inference of Large Foundation Models. Large Models for All.

  • LMFlow: https://github.com/OptimalScale/LMFlow

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • giskard

    🐢 Open-Source Evaluation & Testing framework for LLMs and ML models

  • This is super interesting! Thanks for sharing. We're also working on this research field from an open-source angle (https://github.com/Giskard-AI/giskard)

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • Anomaly Detection with FiftyOne and Anomalib

    4 projects | dev.to | 6 May 2024
  • May 8, 2024 AI, Machine Learning and Computer Vision Meetup

    2 projects | dev.to | 1 May 2024
  • Voxel51 Is Hiring AI Researchers and Scientists — What the New Open Science Positions Mean

    1 project | dev.to | 26 Apr 2024
  • Machine Learning and AI Beyond the Basics Book

    1 project | news.ycombinator.com | 16 Apr 2024
  • Show HN: Evaluate LLM-based RAG Applications with automated test set generation

    1 project | news.ycombinator.com | 11 Apr 2024