InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now. Learn more →
Top 23 Python Benchmark Projects
-
You can manually download and extract the dataset(t10k-images-idx3-ubyte.gz, t10k-labels-idx1-ubyte.gz, train-images-idx3-ubyte.gz and train-labels-idx1-ubyte.gz) from here to data/FashionMNIST/raw/.
-
InfluxDB
InfluxDB – Built for High-Performance Time Series Workloads. InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.
-
-
opencompass
OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.
-
Project mention: Show HN: HNSW index for vector embeddings in approx 500 LOC | news.ycombinator.com | 2025-04-08
Looks neat. It would be useful to compare to other implementations: https://ann-benchmarks.com/ -- potentially not just speed, but implementation details that might change recall.
-
-
-
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
Project mention: Show HN: I Built an AI Software Bot That Creates PRs for GitHub | news.ycombinator.com | 2025-04-25
Copilot sucks in general in my experience - but decent at finding relevance that's about it. I haven't tried this agent mode - will look into it. There are many AI agents so far seems for AI software development I'm looking towards: https://www.swebench.com/ for benchmarking - RA.AID has yet to benchmark as its quite expensive. But that is a good point I can highlight that difference thank you.
-
-
-
-
OSWorld
[NeurIPS 2024] OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
-
beir
A Heterogeneous Benchmark for Information Retrieval. Easy to use, evaluate your models across 15+ diverse IR datasets.
Project mention: Any* Embedding Model Can Become a Late Interaction Model - If You Give It a Chance! | dev.to | 2024-08-29The source code for these experiments is open-source and utilizes beir-qdrant, an integration of Qdrant with the BeIR library. While this package is not officially maintained by the Qdrant team, it may prove useful for those interested in experimenting with various Qdrant configurations to see how they impact retrieval quality. All experiments were conducted using Qdrant in exact search mode, ensuring the results are not influenced by approximate search.
-
-
-
-
-
-
-
-
benchmark
TorchBench is a collection of open source benchmarks used to evaluate PyTorch performance. (by pytorch)
-
-
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Python Benchmark discussion
Python Benchmark related posts
-
How to vibe code for free: Running Qwen3 on your Mac, using MLX
-
Show HN: I Built an AI Software Bot That Creates PRs for GitHub
-
Show HN: EyesOff – Alerts you when someone peeps at your screen
-
JetBrains IDEs Go AI: Coding Agent, Smarter Assistance, Free Tier
-
Show HN: HNSW index for vector embeddings in approx 500 LOC
-
SWE-bench & SWE-bench Verified Benchmarks
-
Google's Gemini 2.5 Pro: Enhanced Reasoning and Coding Features
-
A note from our sponsor - InfluxDB
www.influxdata.com | 13 May 2025
Index
What are some of the best open-source Benchmark projects in Python? This list will help you:
# | Project | Stars |
---|---|---|
1 | fashion-mnist | 12,163 |
2 | mmpose | 6,411 |
3 | opencompass | 5,292 |
4 | ann-benchmarks | 5,264 |
5 | mmaction2 | 4,586 |
6 | Baichuan2 | 4,127 |
7 | Baichuan-13B | 2,980 |
8 | SWE-bench | 2,912 |
9 | promptbench | 2,610 |
10 | mteb | 2,505 |
11 | InternVideo | 1,855 |
12 | OSWorld | 1,838 |
13 | beir | 1,796 |
14 | logparser | 1,747 |
15 | evalplus | 1,465 |
16 | py-motmetrics | 1,432 |
17 | inference | 1,370 |
18 | pytest-benchmark | 1,304 |
19 | smac | 1,197 |
20 | VBench | 966 |
21 | benchmark | 938 |
22 | ADBench | 933 |
23 | Monocular-Depth-Estimation-Toolbox | 923 |