Bench Alternatives

Similar projects and alternatives to bench

ollama

214 68,151 9.9 Go bench VS ollama

Get up and running with Llama 3, Mistral, Gemma, and other large language models.
LocalAI

83 20,346 9.9 C++ bench VS LocalAI

:robot: The free, Open Source OpenAI alternative. Self-hosted, community-driven and local-first. Drop-in replacement for OpenAI running on consumer-grade hardware. No GPU required. Runs gguf, transformers, diffusers and many more models architectures. It allows to generate Text, Audio, Video, Images. Also with voice cloning capabilities.
SurveyJS

surveyjs.io featured

Open-Source JSON Form Builder to Create Dynamic Forms Right in Your App. With SurveyJS form UI libraries, you can build and style forms in a fully-integrated drag & drop form builder, render them in your JS app, and store form submission data in any backend, inc. PHP, ASP.NET Core, and Node.js.
evals

49 14,097 9.3 Python bench VS evals

Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.
litellm

28 8,907 10.0 Python bench VS litellm

Call all LLM APIs using the OpenAI format. Use Bedrock, Azure, OpenAI, Cohere, Anthropic, Ollama, Sagemaker, HuggingFace, Replicate (100+ LLMs)
promptfoo

20 2,921 9.9 TypeScript bench VS promptfoo

Test your prompts, models, and RAGs. Catch regressions and improve prompt quality. LLM evals for OpenAI, Azure, Anthropic, Gemini, Mistral, Llama, Bedrock, Ollama, and other local & private models with CI/CD integration.
ChainForge

14 2,032 8.9 TypeScript bench VS ChainForge

An open-source visual programming environment for battle-testing prompts to LLMs.
GodMode

7 4,041 9.3 TypeScript bench VS GodMode

AI Chat Browser: Fast, Full webapp access to ChatGPT / Claude / Bard / Bing / Llama2! I use this 20 times a day.
InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
fiddler-auditor

2 143 8.1 Python bench VS fiddler-auditor

Fiddler Auditor is a tool to evaluate language models.
TheoremQA

2 152 7.5 Python bench VS TheoremQA

The dataset and code for paper: TheoremQA: A Theorem-driven Question Answering dataset

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a better bench alternative or higher similarity.

Suggest an alternative to bench

bench reviews and mentions

Posts with mentions or reviews of bench. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-09-09.

I asked 60 LLMs a set of 20 questions
10 projects | news.ycombinator.com | 9 Sep 2023

Thanks for sharing, looks interesting!
I've actually been using a similar LLM evaluation tool called Arthur Bench: https://github.com/arthur-ai/bench
Some great scoring methods built in and a nice UI on top of it as well