Evaluation

Top 23 Evaluation Open-Source Projects

  • awesome-semantic-segmentation

    :metal: awesome-semantic-segmentation

  • govaluate

    Arbitrary expression evaluation for golang

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

  • write-you-a-haskell

    Building a modern functional compiler from first principles. (http://dev.stephendiehl.com/fun/)

    Project mention: A decade of developing a programming language | news.ycombinator.com | 2023-11-14

    I highly recommend https://github.com/sdiehl/write-you-a-haskell as it is very developer friendly. It’s not complete, but it really gets the gears turning and will set you up for writing your own Hendley-Milner style type checker.

  • klipse

    Klipse is a JavaScript plugin for embedding interactive code snippets in tech blogs.

  • opencompass

    OpenCompass is an LLM evaluation platform, supporting a wide range of models (InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.

    Project mention: Show HN: Times faster LLM evaluation with Bayesian optimization | news.ycombinator.com | 2024-02-13

    Fair question.

    Evaluate refers to the phase after training to check if the training is good.

    Usually the flow goes training -> evaluation -> deployment (what you called inference). This project is aimed for evaluation. Evaluation can be slow (might even be slower than training if you're finetuning on a small domain specific subset)!

    So there are [quite](https://github.com/microsoft/promptbench) [a](https://github.com/confident-ai/deepeval) [few](https://github.com/openai/evals) [frameworks](https://github.com/EleutherAI/lm-evaluation-harness) working on evaluation, however, all of them are quite slow, because LLM are slow if you don't have infinite money. [This](https://github.com/open-compass/opencompass) one tries to speed up by parallelizing on multiple computers, but none of them takes advantage of the fact that many evaluation queries might be similar and all try to evaluate on all given queries. And that's where this project might come in handy.

  • promptbench

    A unified evaluation framework for large language models

    Project mention: Show HN: Times faster LLM evaluation with Bayesian optimization | news.ycombinator.com | 2024-02-13

    Fair question.

    Evaluate refers to the phase after training to check if the training is good.

    Usually the flow goes training -> evaluation -> deployment (what you called inference). This project is aimed for evaluation. Evaluation can be slow (might even be slower than training if you're finetuning on a small domain specific subset)!

    So there are [quite](https://github.com/microsoft/promptbench) [a](https://github.com/confident-ai/deepeval) [few](https://github.com/openai/evals) [frameworks](https://github.com/EleutherAI/lm-evaluation-harness) working on evaluation, however, all of them are quite slow, because LLM are slow if you don't have infinite money. [This](https://github.com/open-compass/opencompass) one tries to speed up by parallelizing on multiple computers, but none of them takes advantage of the fact that many evaluation queries might be similar and all try to evaluate on all given queries. And that's where this project might come in handy.

  • uptrain

    UpTrain is an open-source unified platform to evaluate and improve Generative AI applications. We provide grades for 20+ preconfigured checks (covering language, code, embedding use-cases), perform root cause analysis on failure cases and give insights on how to resolve them.

    Project mention: Evaluation of OpenAI Assistants | dev.to | 2024-04-09

    Currently seeking feedback for the developed tool. Would love it if you can check it out on: https://github.com/uptrain-ai/uptrain/blob/main/examples/assistants/assistant_evaluator.ipynb

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

  • evaluate

    🤗 Evaluate: A library for easily evaluating machine learning models and datasets.

  • EvalAI

    :cloud: :rocket: :bar_chart: :chart_with_upwards_trend: Evaluating state of the art in AI

  • avalanche

    Avalanche: an End-to-End Library for Continual Learning based on PyTorch.

  • pycm

    Multi-class confusion matrix library in Python

    Project mention: PyCM 4.0 Released: Multilabel Confusion Matrix Support | /r/coolgithubprojects | 2023-06-07
  • LLM-eval-survey

    The official GitHub page for the survey paper "A Survey on Evaluation of Large Language Models".

    Project mention: A Survey on Evaluation of Large Language Models | news.ycombinator.com | 2023-07-18
  • lispy

    Short and sweet LISP editing

    Project mention: Sapling: A highly experimental vi-inspired editor where you edit code, not text | news.ycombinator.com | 2024-02-04
  • alpaca_eval

    An automatic evaluator for instruction-following language models. Human-validated, high-quality, cheap, and fast.

    Project mention: UltraLM-13B reaches top of AlpacaEval leaderboard | /r/LocalLLaMA | 2023-06-28

    Alpaca Eval is open source and was developed by the same team who trained the alpaca model afaik. It is not like what you said in the other comment

  • torch-fidelity

    High-fidelity performance metrics for generative models in PyTorch

  • semantic-kitti-api

    SemanticKITTI API for visualizing dataset, processing data, and evaluating results.

  • gval

    Expression evaluation in golang

  • ExpressionEvaluator

    A Simple Math and Pseudo C# Expression Evaluator in One C# File. Can also execute small C# like scripts

  • long-form-factuality

    Benchmarking long-form factuality in large language models. Original code for our paper "Long-form factuality in large language models".

    Project mention: An Open Source Tool for Multimodal Fact Verification | news.ycombinator.com | 2024-04-06

    Isn't this similar to the Deepmind paper on long form factuality posted a few days ago?

    https://arxiv.org/abs/2403.18802

    https://github.com/google-deepmind/long-form-factuality/tree...

  • Eval-Expression.NET

    C# Eval Expression | Evaluate, Compile, and Execute C# code and expression at runtime.

  • simpleeval

    Simple Safe Sandboxed Extensible Expression Evaluator for Python

  • errant

    ERRor ANnotation Toolkit: Automatically extract and classify grammatical errors in parallel original and corrected sentences.

    Project mention: Given the rise of LLMs, is a toolkit like ERRANT still relevant? | /r/LanguageTechnology | 2023-12-10

    ERRANT automatically annotates parallel English sentences with error type information.

  • ranx

    ⚡️A Blazing-Fast Python Library for Ranking Evaluation, Comparison, and Fusion 🐍

    Project mention: Sparse Vectors in Qdrant: Pure Vector-based Hybrid Search | dev.to | 2024-02-19

    Ranx is a great library for mixing results from different sources.

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2024-04-09.

Evaluation related posts

Index

What are some of the best open-source Evaluation projects? This list will help you:

Project Stars
1 awesome-semantic-segmentation 10,220
2 govaluate 3,529
3 write-you-a-haskell 3,304
4 klipse 3,088
5 opencompass 2,403
6 promptbench 1,954
7 uptrain 1,951
8 evaluate 1,803
9 EvalAI 1,673
10 avalanche 1,654
11 pycm 1,428
12 LLM-eval-survey 1,206
13 lispy 1,183
14 alpaca_eval 1,058
15 torch-fidelity 870
16 semantic-kitti-api 722
17 gval 696
18 ExpressionEvaluator 562
19 long-form-factuality 428
20 Eval-Expression.NET 423
21 simpleeval 420
22 errant 410
23 ranx 325
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com