Top 23 Benchmark Open-Source Projects

hyperfine

74 20,020 8.1 Rust

A command-line benchmarking tool

Project mention: Measuring startup and shutdown overhead of several code interpreters | dev.to | 2024-04-17

Check out the official hyperfine Github repo

fashion-mnist

15 11,439 0.0 Python

A MNIST-like fashion product database. Benchmark :point_down:

Project mention: Logistic Regression for Image Classification Using OpenCV | news.ycombinator.com | 2023-12-31

In this case there's no advantage to using logistic regression on an image other than the novelty. Logistic regression is excellent for feature explainability, but you can't explain anything from an image.
Traditional classification algorithms but not deep learning such as SVMs and Random Forest perform a lot better on MNIST, up to 97% accuracy compared to the 88% from logistic regression in this post. Check the Original MNIST benchmarks here: http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/#

InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
awesome-semantic-segmentation

1 10,337 0.0

:metal: awesome-semantic-segmentation
BenchmarkDotNet

67 10,056 9.2 C#

Powerful .NET library for benchmarking

Project mention: Stop Guessing, Start Measuring: Transform Your Code with BenchmarkDotnet! | dev.to | 2024-02-13

Let’s look at the first example you see, when you open up BenchmarkDotnet’s website, or Github page.

benchmark

19 8,418 8.7 C++

A microbenchmark support library

Project mention: How can I check the execution time of a program rendered in SFML? | /r/cpp_questions | 2023-12-05

tianshou

8 7,435 9.5 Python

An elegant PyTorch deep reinforcement learning library.

Project mention: Is it better to not use the Target Update Frequency in Double DQN or depends on the application? | /r/reinforcementlearning | 2023-07-05

The tianshou implementation I found at https://github.com/thu-ml/tianshou/blob/master/tianshou/policy/modelfree/dqn.py is DQN by default.

FrameworkBenchmarks

366 7,391 9.8 Java

Source for the TechEmpower Framework Benchmarks project

Project mention: Why choose async/await over threads? | news.ycombinator.com | 2024-03-25

Neat. Thanks for sharing!
Interestingly, may-minihttp is faring very well in the TechEmpower benchmark [1], for whatever those benchmarks are worth. The code is also surprisingly straightforward [2].
[1] https://www.techempower.com/benchmarks/
[2] https://github.com/TechEmpower/FrameworkBenchmarks/blob/mast...

SaaSHub

www.saashub.com featured

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
web-frameworks

26 6,900 9.8 PHP

Which is the fastest web framework?
sysbench

4 5,804 2.4 C

Scriptable database and system performance benchmark
benchmark.js

7 5,488 0.0 JavaScript

A benchmarking library. As used on jsPerf.com.
mmpose

31 5,025 8.0 Python

OpenMMLab Pose Estimation Toolbox and Benchmark.
ann-benchmarks

51 4,604 7.7 Python

Benchmarks of approximate nearest neighbor libraries in Python

Project mention: Using Your Vector Database as a JSON (Or Relational) Datastore | news.ycombinator.com | 2024-04-23

On top of my head, pgvector only supports 2 indexes, those are running in memory only. They don't support GPU indexing, nor Disk based indexing, they also don't have separation of query and insertions.
Also with different people I've talked to, they struggle with scale past 100K-1M vector.
You can also have a look yourself from a performance perspective: https://ann-benchmarks.com/

criterion.rs

30 4,170 6.5 Rust

Statistics-driven benchmarking library for Rust

Project mention: How to benchmark in Rust with libtest bench | /r/bencher | 2023-12-03

The three popular options for benchmarking in Rust are: libtest bench, Criterion, and Iai.

oha

3 3,983 9.4 Rust

Ohayou(おはよう), HTTP load generator, inspired by rakyll/hey with tui animation.
Baichuan2

1 3,936 7.3 Python

A series of large language models developed by Baichuan Intelligent Technology

Project mention: Baichuan 2 | news.ycombinator.com | 2023-10-12

mmaction2

5 3,916 7.2 Python

OpenMMLab's Next Generation Video Understanding Toolbox and Benchmark
coost

15 3,835 8.3 C++

A tiny boost library in C++11.

Project mention: Write C++ as easy as Golang with coost | news.ycombinator.com | 2023-09-09

yet-another-bench-script

23 3,760 6.9 Shell

YABS - a simple bash script to estimate Linux server performance using fio, iperf3, & Geekbench

Project mention: YABS: Yet-Another-Bench-Script | news.ycombinator.com | 2024-03-31

awesome-http-benchmark

3 3,209 4.6

HTTP(S) benchmark tools, testing/debugging, & restAPI (RESTful)
Baichuan-13B

2 2,959 7.3 Python

A 13B large language model developed by Baichuan Intelligent Technology

Project mention: Baichuan IA de China | /r/techieHugui | 2023-07-22

Derailed Benchmarks

4 2,921 0.0 Ruby

Go faster, off the Rails - Benchmarks for your whole Rails app
XcodeBenchmark

62 2,918 8.1 Swift

XcodeBenchmark measures the compilation time of a large codebase on iMac, MacBook, and Mac Pro

Project mention: 2023 Mac Mini 10 Core M2 Pro – A Beast for Xcode Compilation | news.ycombinator.com | 2024-02-08

opencompass

1 2,603 9.7 Python

OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.

Project mention: Show HN: Times faster LLM evaluation with Bayesian optimization | news.ycombinator.com | 2024-02-13

Fair question.
Evaluate refers to the phase after training to check if the training is good.
Usually the flow goes training -> evaluation -> deployment (what you called inference). This project is aimed for evaluation. Evaluation can be slow (might even be slower than training if you're finetuning on a small domain specific subset)!
So there are [quite](https://github.com/microsoft/promptbench) [a](https://github.com/confident-ai/deepeval) [few](https://github.com/openai/evals) [frameworks](https://github.com/EleutherAI/lm-evaluation-harness) working on evaluation, however, all of them are quite slow, because LLM are slow if you don't have infinite money. [This](https://github.com/open-compass/opencompass) one tries to speed up by parallelizing on multiple computers, but none of them takes advantage of the fact that many evaluation queries might be similar and all try to evaluate on all given queries. And that's where this project might come in handy.

SaaSHub

www.saashub.com featured

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Benchmark related posts

Umbra: A Disk-Based System with In-Memory Performance [pdf]

3 projects | news.ycombinator.com | 2 May 2024
Using Your Vector Database as a JSON (Or Relational) Datastore

1 project | news.ycombinator.com | 23 Apr 2024
Measuring startup and shutdown overhead of several code interpreters

2 projects | dev.to | 17 Apr 2024
Why SQLite Performance Tuning Made Bencher 1200x Faster

1 project | news.ycombinator.com | 17 Apr 2024
LLM Colosseum

1 project | news.ycombinator.com | 10 Apr 2024
PullRequestBenchmark Challenge: Can AI Replace Your Dev Team?

1 project | news.ycombinator.com | 10 Apr 2024
Evaluate LLMs in Real Time with Street Fighter III

1 project | news.ycombinator.com | 8 Apr 2024
A note from our sponsor - InfluxDB
www.influxdata.com | 7 May 2024

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →

Index

What are some of the best open-source Benchmark projects? This list will help you:

	Project	Stars
1	hyperfine	20,020
2	fashion-mnist	11,439
3	awesome-semantic-segmentation	10,337
4	BenchmarkDotNet	10,056
5	benchmark	8,418
6	tianshou	7,435
7	FrameworkBenchmarks	7,391
8	web-frameworks	6,900
9	sysbench	5,804
10	benchmark.js	5,488
11	mmpose	5,025
12	ann-benchmarks	4,604
13	criterion.rs	4,170
14	oha	3,983
15	Baichuan2	3,936
16	mmaction2	3,916
17	coost	3,835
18	yet-another-bench-script	3,760
19	awesome-http-benchmark	3,209
20	Baichuan-13B	2,959
21	Derailed Benchmarks	2,921
22	XcodeBenchmark	2,918
23	opencompass	2,603

Benchmark

Top 23 Benchmark Open-Source Projects

Benchmark related posts

Umbra: A Disk-Based System with In-Memory Performance [pdf]

Using Your Vector Database as a JSON (Or Relational) Datastore

Measuring startup and shutdown overhead of several code interpreters

Why SQLite Performance Tuning Made Bencher 1200x Faster

LLM Colosseum

PullRequestBenchmark Challenge: Can AI Replace Your Dev Team?

Evaluate LLMs in Real Time with Street Fighter III

Index