Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →
Top 23 Benchmark Open-Source Projects
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
yet-another-bench-script
YABS - a simple bash script to estimate Linux server performance using fio, iperf3, & Geekbench
-
XcodeBenchmark
XcodeBenchmark measures the compilation time of a large codebase on iMac, MacBook, and Mac Pro
-
opencompass
OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Project mention: Measuring startup and shutdown overhead of several code interpreters | dev.to | 2024-04-17Check out the official hyperfine Github repo
Project mention: Logistic Regression for Image Classification Using OpenCV | news.ycombinator.com | 2023-12-31In this case there's no advantage to using logistic regression on an image other than the novelty. Logistic regression is excellent for feature explainability, but you can't explain anything from an image.
Traditional classification algorithms but not deep learning such as SVMs and Random Forest perform a lot better on MNIST, up to 97% accuracy compared to the 88% from logistic regression in this post. Check the Original MNIST benchmarks here: http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/#
Project mention: Stop Guessing, Start Measuring: Transform Your Code with BenchmarkDotnet! | dev.to | 2024-02-13Let’s look at the first example you see, when you open up BenchmarkDotnet’s website, or Github page.
Project mention: How can I check the execution time of a program rendered in SFML? | /r/cpp_questions | 2023-12-05
Project mention: Is it better to not use the Target Update Frequency in Double DQN or depends on the application? | /r/reinforcementlearning | 2023-07-05The tianshou implementation I found at https://github.com/thu-ml/tianshou/blob/master/tianshou/policy/modelfree/dqn.py is DQN by default.
Neat. Thanks for sharing!
Interestingly, may-minihttp is faring very well in the TechEmpower benchmark [1], for whatever those benchmarks are worth. The code is also surprisingly straightforward [2].
[1] https://www.techempower.com/benchmarks/
[2] https://github.com/TechEmpower/FrameworkBenchmarks/blob/mast...
Project mention: Using Your Vector Database as a JSON (Or Relational) Datastore | news.ycombinator.com | 2024-04-23On top of my head, pgvector only supports 2 indexes, those are running in memory only. They don't support GPU indexing, nor Disk based indexing, they also don't have separation of query and insertions.
Also with different people I've talked to, they struggle with scale past 100K-1M vector.
You can also have a look yourself from a performance perspective: https://ann-benchmarks.com/
The three popular options for benchmarking in Rust are: libtest bench, Criterion, and Iai.
Project mention: 2023 Mac Mini 10 Core M2 Pro – A Beast for Xcode Compilation | news.ycombinator.com | 2024-02-08
Project mention: Show HN: Times faster LLM evaluation with Bayesian optimization | news.ycombinator.com | 2024-02-13Fair question.
Evaluate refers to the phase after training to check if the training is good.
Usually the flow goes training -> evaluation -> deployment (what you called inference). This project is aimed for evaluation. Evaluation can be slow (might even be slower than training if you're finetuning on a small domain specific subset)!
So there are [quite](https://github.com/microsoft/promptbench) [a](https://github.com/confident-ai/deepeval) [few](https://github.com/openai/evals) [frameworks](https://github.com/EleutherAI/lm-evaluation-harness) working on evaluation, however, all of them are quite slow, because LLM are slow if you don't have infinite money. [This](https://github.com/open-compass/opencompass) one tries to speed up by parallelizing on multiple computers, but none of them takes advantage of the fact that many evaluation queries might be similar and all try to evaluate on all given queries. And that's where this project might come in handy.
Benchmark related posts
-
Umbra: A Disk-Based System with In-Memory Performance [pdf]
-
Using Your Vector Database as a JSON (Or Relational) Datastore
-
Measuring startup and shutdown overhead of several code interpreters
-
Why SQLite Performance Tuning Made Bencher 1200x Faster
-
LLM Colosseum
-
PullRequestBenchmark Challenge: Can AI Replace Your Dev Team?
-
Evaluate LLMs in Real Time with Street Fighter III
-
A note from our sponsor - InfluxDB
www.influxdata.com | 7 May 2024
Index
What are some of the best open-source Benchmark projects? This list will help you:
Project | Stars | |
---|---|---|
1 | hyperfine | 20,020 |
2 | fashion-mnist | 11,439 |
3 | awesome-semantic-segmentation | 10,337 |
4 | BenchmarkDotNet | 10,056 |
5 | benchmark | 8,418 |
6 | tianshou | 7,435 |
7 | FrameworkBenchmarks | 7,391 |
8 | web-frameworks | 6,900 |
9 | sysbench | 5,804 |
10 | benchmark.js | 5,488 |
11 | mmpose | 5,025 |
12 | ann-benchmarks | 4,604 |
13 | criterion.rs | 4,170 |
14 | oha | 3,983 |
15 | Baichuan2 | 3,936 |
16 | mmaction2 | 3,916 |
17 | coost | 3,835 |
18 | yet-another-bench-script | 3,760 |
19 | awesome-http-benchmark | 3,209 |
20 | Baichuan-13B | 2,959 |
21 | Derailed Benchmarks | 2,921 |
22 | XcodeBenchmark | 2,918 |
23 | opencompass | 2,603 |
Sponsored