Python Benchmark

Open-source Python projects categorized as Benchmark

Top 23 Python Benchmark Projects

  1. fashion-mnist

    A MNIST-like fashion product database. Benchmark :point_down:

    Project mention: FashionMNIST in PyTorch | dev.to | 2024-12-08

    You can manually download and extract the dataset(t10k-images-idx3-ubyte.gz, t10k-labels-idx1-ubyte.gz, train-images-idx3-ubyte.gz and train-labels-idx1-ubyte.gz) from here to data/FashionMNIST/raw/.

  2. InfluxDB

    InfluxDB – Built for High-Performance Time Series Workloads. InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.

    InfluxDB logo
  3. mmpose

    OpenMMLab Pose Estimation Toolbox and Benchmark.

  4. opencompass

    OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.

  5. ann-benchmarks

    Benchmarks of approximate nearest neighbor libraries in Python

    Project mention: Show HN: HNSW index for vector embeddings in approx 500 LOC | news.ycombinator.com | 2025-04-08

    Looks neat. It would be useful to compare to other implementations: https://ann-benchmarks.com/ -- potentially not just speed, but implementation details that might change recall.

  6. mmaction2

    OpenMMLab's Next Generation Video Understanding Toolbox and Benchmark

  7. Baichuan2

    A series of large language models developed by Baichuan Intelligent Technology

  8. Baichuan-13B

    A 13B large language model developed by Baichuan Intelligent Technology

  9. SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  10. SWE-bench

    SWE-bench [Multimodal]: Can Language Models Resolve Real-world Github Issues?

    Project mention: Show HN: I Built an AI Software Bot That Creates PRs for GitHub | news.ycombinator.com | 2025-04-25

    Copilot sucks in general in my experience - but decent at finding relevance that's about it. I haven't tried this agent mode - will look into it. There are many AI agents so far seems for AI software development I'm looking towards: https://www.swebench.com/ for benchmarking - RA.AID has yet to benchmark as its quite expensive. But that is a good point I can highlight that difference thank you.

  11. promptbench

    A unified evaluation framework for large language models

  12. mteb

    MTEB: Massive Text Embedding Benchmark

    Project mention: Text Embedding Benchmark (2022) | news.ycombinator.com | 2024-11-04
  13. InternVideo

    [ECCV2024] Video Foundation Models & Data for Multimodal Understanding

  14. OSWorld

    [NeurIPS 2024] OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments

    Project mention: Has Anthropic Claude just wiped out an entire industry? | dev.to | 2024-10-27
  15. beir

    A Heterogeneous Benchmark for Information Retrieval. Easy to use, evaluate your models across 15+ diverse IR datasets.

    Project mention: Any* Embedding Model Can Become a Late Interaction Model - If You Give It a Chance! | dev.to | 2024-08-29

    The source code for these experiments is open-source and utilizes beir-qdrant, an integration of Qdrant with the BeIR library. While this package is not officially maintained by the Qdrant team, it may prove useful for those interested in experimenting with various Qdrant configurations to see how they impact retrieval quality. All experiments were conducted using Qdrant in exact search mode, ensuring the results are not influenced by approximate search.

  16. logparser

    A machine learning toolkit for log parsing [ICSE'19, DSN'16]

  17. evalplus

    Rigourous evaluation of LLM-synthesized code - NeurIPS 2023 & COLM 2024

  18. py-motmetrics

    :bar_chart: Benchmark multiple object trackers (MOT) in Python

  19. inference

    Reference implementations of MLPerf™ inference benchmarks (by mlcommons)

  20. pytest-benchmark

    pytest fixture for benchmarking code

  21. smac

    SMAC: The StarCraft Multi-Agent Challenge

  22. VBench

    [CVPR2024 Highlight] VBench - We Evaluate Video Generation

  23. benchmark

    TorchBench is a collection of open source benchmarks used to evaluate PyTorch performance. (by pytorch)

    Project mention: PyTorch is dead. Long live Jax | news.ycombinator.com | 2024-08-17
  24. ADBench

    Official Implement of "ADBench: Anomaly Detection Benchmark", NeurIPS 2022.

  25. Monocular-Depth-Estimation-Toolbox

    Monocular Depth Estimation Toolbox based on MMSegmentation.

  26. SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python Benchmark discussion

Log in or Post with

Python Benchmark related posts

  • How to vibe code for free: Running Qwen3 on your Mac, using MLX

    4 projects | news.ycombinator.com | 1 May 2025
  • Show HN: I Built an AI Software Bot That Creates PRs for GitHub

    1 project | news.ycombinator.com | 25 Apr 2025
  • Show HN: EyesOff – Alerts you when someone peeps at your screen

    1 project | news.ycombinator.com | 19 Apr 2025
  • JetBrains IDEs Go AI: Coding Agent, Smarter Assistance, Free Tier

    1 project | news.ycombinator.com | 16 Apr 2025
  • Show HN: HNSW index for vector embeddings in approx 500 LOC

    4 projects | news.ycombinator.com | 8 Apr 2025
  • SWE-bench & SWE-bench Verified Benchmarks

    1 project | dev.to | 6 Apr 2025
  • Google's Gemini 2.5 Pro: Enhanced Reasoning and Coding Features

    1 project | dev.to | 29 Mar 2025
  • A note from our sponsor - InfluxDB
    www.influxdata.com | 13 May 2025
    InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now. Learn more →

Index

What are some of the best open-source Benchmark projects in Python? This list will help you:

# Project Stars
1 fashion-mnist 12,163
2 mmpose 6,411
3 opencompass 5,292
4 ann-benchmarks 5,264
5 mmaction2 4,586
6 Baichuan2 4,127
7 Baichuan-13B 2,980
8 SWE-bench 2,912
9 promptbench 2,610
10 mteb 2,505
11 InternVideo 1,855
12 OSWorld 1,838
13 beir 1,796
14 logparser 1,747
15 evalplus 1,465
16 py-motmetrics 1,432
17 inference 1,370
18 pytest-benchmark 1,304
19 smac 1,197
20 VBench 966
21 benchmark 938
22 ADBench 933
23 Monocular-Depth-Estimation-Toolbox 923

Sponsored
InfluxDB – Built for High-Performance Time Series Workloads
InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.
www.influxdata.com

Did you know that Python is
the 2nd most popular programming language
based on number of references?