bench

A tool for evaluating LLMs (by arthur-ai)

Bench Alternatives

Similar projects and alternatives to bench

  • ollama

    Get up and running with Llama 3, Mistral, Gemma, and other large language models.

  • LocalAI

    :robot: The free, Open Source OpenAI alternative. Self-hosted, community-driven and local-first. Drop-in replacement for OpenAI running on consumer-grade hardware. No GPU required. Runs gguf, transformers, diffusers and many more models architectures. It allows to generate Text, Audio, Video, Images. Also with voice cloning capabilities.

  • SurveyJS

    Open-Source JSON Form Builder to Create Dynamic Forms Right in Your App. With SurveyJS form UI libraries, you can build and style forms in a fully-integrated drag & drop form builder, render them in your JS app, and store form submission data in any backend, inc. PHP, ASP.NET Core, and Node.js.

    SurveyJS logo
  • evals

    49 bench VS evals

    Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.

  • litellm

    28 bench VS litellm

    Call all LLM APIs using the OpenAI format. Use Bedrock, Azure, OpenAI, Cohere, Anthropic, Ollama, Sagemaker, HuggingFace, Replicate (100+ LLMs)

  • promptfoo

    20 bench VS promptfoo

    Test your prompts, models, and RAGs. Catch regressions and improve prompt quality. LLM evals for OpenAI, Azure, Anthropic, Gemini, Mistral, Llama, Bedrock, Ollama, and other local & private models with CI/CD integration.

  • ChainForge

    14 bench VS ChainForge

    An open-source visual programming environment for battle-testing prompts to LLMs.

  • GodMode

    7 bench VS GodMode

    AI Chat Browser: Fast, Full webapp access to ChatGPT / Claude / Bard / Bing / Llama2! I use this 20 times a day.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • fiddler-auditor

    Fiddler Auditor is a tool to evaluate language models.

  • TheoremQA

    The dataset and code for paper: TheoremQA: A Theorem-driven Question Answering dataset

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a better bench alternative or higher similarity.

bench reviews and mentions

Posts with mentions or reviews of bench. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-09-09.
  • I asked 60 LLMs a set of 20 questions
    10 projects | news.ycombinator.com | 9 Sep 2023
    Thanks for sharing, looks interesting!

    I've actually been using a similar LLM evaluation tool called Arthur Bench: https://github.com/arthur-ai/bench

    Some great scoring methods built in and a nice UI on top of it as well

Stats

Basic bench repo stats
1
338
8.4
11 days ago

arthur-ai/bench is an open source project licensed under MIT License which is an OSI approved license.

The primary programming language of bench is TypeScript.


Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com