Python Benchmark

Open-source Python projects categorized as Benchmark

Top 23 Python Benchmark Projects

  • fashion-mnist

    A MNIST-like fashion product database. Benchmark :point_down:

    Project mention: Pre-Trained ML models for labeling retail images? Upload an image of a dress shirt and the labels output are “long sleeve, men’s, button down, collar, formal, dress shirt” or better? | /r/learnmachinelearning | 2023-04-25
  • tianshou

    An elegant PyTorch deep reinforcement learning library.

    Project mention: Is it better to not use the Target Update Frequency in Double DQN or depends on the application? | /r/reinforcementlearning | 2023-07-05

    The tianshou implementation I found at is DQN by default.

  • InfluxDB

    Collect and Analyze Billions of Data Points in Real Time. Manage all types of time series data in a single, purpose-built database. Run at any scale in any environment in the cloud, on-premises, or at the edge.

  • ann-benchmarks

    Benchmarks of approximate nearest neighbor libraries in Python

    Project mention: How We Made PostgreSQL a Better Vector Database | | 2023-09-25

    (Blog author here). Thanks for the question. In this case the index for both DiskANN and pgvector HNSW is small enough to fit in memory on the machine (8GB RAM), so there's no need to touch the SSD. We plan to test on a config where the index size is larger than memory (we couldn't this time due to limitations in ANN benchmarks [0], the tool we use).

    To your question about RAM usage, we provide a graph of index size. When enabling PQ, our new index is 10x smaller than pgvector HNSW. We don't have numbers for HNSWPQ in FAISS yet.


  • mmpose

    OpenMMLab Pose Estimation Toolbox and Benchmark.

    Project mention: RTMPose: The All-In-One Real-time Pose Estimation Solution for R&D | /r/artificial | 2023-03-19

    RTMPose-m achieves 75.8% AP on COCO with 90+ FPS on an Intel i7-11700 CPU and 430+ FPS on an NVIDIA GTX 1660 Ti GPU, and RTMPose-l achieves 67.0% AP on COCO-WholeBody with 130+ FPS.

  • mmaction2

    OpenMMLab's Next Generation Video Understanding Toolbox and Benchmark

    Project mention: How good does contextual action recognition get? | /r/computervision | 2023-01-02

    Mmaction2: Has some examples.

  • Baichuan-13B

    A 13B large language model developed by Baichuan Intelligent Technology

    Project mention: Baichuan IA de China | /r/techieHugui | 2023-07-22
  • py-motmetrics

    :bar_chart: Benchmark multiple object trackers (MOT) in Python

  • Mergify

    Updating dependencies is time-consuming.. Solutions like Dependabot or Renovate update but don't merge dependencies. You need to do it manually while it could be fully automated! Add a Merge Queue to your workflow and stop caring about PR management & merging. Try Mergify for free.

  • logparser

    A machine learning toolkit for log parsing [ICSE'19, DSN'16] (by logpai)

    Project mention: A machine learning toolkit for log parsing [ICSE'19, DSN'16] | | 2023-09-20
  • pytest-benchmark

    py.test fixture for benchmarking code

    Project mention: Investigating Pydantic v2's Bold Performance Claims | | 2023-05-17

    To test this, we will setup some benchmarks using pytest-benchmark, some sample data with a simple schema, and compare results between Python's dataclass, Pydantic v1, and v2.

  • beir

    A Heterogeneous Benchmark for Information Retrieval. Easy to use, evaluate your models across 15+ diverse IR datasets.

    Project mention: Benefits of hybrid search | | 2023-08-18

    Custom datasets can also be evaluated using this method as specified in this link. This article and the associated benchmarks script can be reused to evaluate what method works best on your data.

  • smac

    SMAC: The StarCraft Multi-Agent Challenge

  • asv

    Airspeed Velocity: A simple Python benchmarking tool with web-based reporting

    Project mention: git-appraise – Distributed Code Review for Git | | 2023-08-10

    > All these workflows are a derivation of the source in the repository and keeping them close together has a great aesthetic.

    I agree. Version control is a great enabler, so using it to track "sources" other than just code can be useful. A couple of tools I like to use:

    - Artemis, for tracking issues

    - ASV, for tracking benchmark results (I use this for non-Python projects via my asv-nix plugin )

  • Monocular-Depth-Estimation-Toolbox

    Monocular Depth Estimation Toolbox based on MMSegmentation.

    Project mention: With the absence of ultrasonics, how will Tesla measure depth when stationary without stereoscopic vision | /r/teslamotors | 2022-10-13

    Pretty picture:

  • pyperformance

    Python Performance Benchmark Suite

    Project mention: Phoronix: PyPerformance benchmark is on average 32% faster on Python 3.11 compared to 3.10 (on a Ryzen 9 5950X) | /r/Python | 2022-10-26

    PyPerformance benchmark:

  • py-frameworks-bench

    Another benchmark for some python frameworks

  • benchmark

    TorchBench is a collection of open source benchmarks used to evaluate PyTorch performance. (by pytorch)

    Project mention: PyTorch Primitives in WebGPU for the Browser | | 2023-05-19

    >What's a fair benchmark?

    the absolute golden benchmarks are

  • ADBench

    Official Implement of "ADBench: Anomaly Detection Benchmark", NeurIPS 2023.

  • mteb

    MTEB: Massive Text Embedding Benchmark

    Project mention: AI for AWS Documentation | | 2023-07-06

    RAG is very difficult to do right. I am experimenting with various RAG projects from [1]. The main problems are:

    - Chunking can interfer with context boundaries

    - Content vectors can differ vastly from question vectors, for this you have to use hypothetical embeddings (they generate artificial questions and store them)

    - Instead of saving just one embedding per text-chuck you should store various (text chunk, hypothetical embedding questions, meta data)

    - RAG will miserably fail with requests like "summarize the whole document"

    - to my knowledge, openAI embeddings aren't performing well, use a embedding that is optimized for question answering or information retrieval and supports multi language. Also look into instructor embeddings:


  • tape

    Tasks Assessing Protein Embeddings (TAPE), a set of five biologically relevant semi-supervised learning tasks spread across different domains of protein biology. (by songlab-cal)

  • InternVideo

    InternVideo: General Video Foundation Models via Generative and Discriminative Learning (

    Project mention: [Demo] Watch Videos with ChatGPT | /r/ChatGPT | 2023-04-19

    Thanks for your interest! If you had any ideas to make the given demo more user-friendly, please do not hesitate to share them with us. We are open to discussing relevant ideas about video foundation models or other topics. We made some progress in these areas (InternVideo, VideoMAE v2, UMT, and more). We believe that user-level intelligent video understanding is on the horizon with the current LLM, computing power, and video data.

  • PDEBench

    PDEBench: An Extensive Benchmark for Scientific Machine Learning

    Project mention: [D] what are the SOTA neural PDE solvers besides FNO? | /r/MachineLearning | 2022-11-22


  • opencv_zoo

    Model Zoo For OpenCV DNN and Benchmarks.

    Project mention: [R] Where to purchase legitimate models (already trained) and datasets? | /r/MachineLearning | 2023-03-02
  • evalplus

    EvalPlus for rigourous evaluation of LLM-synthesized code

    Project mention: The AI Reproducibility Crisis in GPT-3.5/GPT-4 Research | | 2023-08-25

    *Further Reading*:

    - [GPT-4's decline over time (HackerNews)](

    - [GPT-4 downgrade discussions (OpenAI Forums)](

    - [Behavioral changes in ChatGPT (arXiv)](

    - [Zero-Shot Replication Effort (Github)](

    - [Inconsistencies in GPT-4 HumanEval (Github)](

    - [Early experiments with GPT-4 (arXiv)](

    - [GPT-4 Technical Report (arXiv)](

  • Sonar

    Write Clean Python Code. Always.. Sonar helps you commit clean code every time. With over 225 unique rules to find Python bugs, code smells & vulnerabilities, Sonar finds the issues while you focus on the work.

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2023-09-25.

Python Benchmark related posts


What are some of the best open-source Benchmark projects in Python? This list will help you:

Project Stars
1 fashion-mnist 11,032
2 tianshou 6,698
3 ann-benchmarks 4,051
4 mmpose 4,050
5 mmaction2 3,451
6 Baichuan-13B 2,645
7 py-motmetrics 1,250
8 logparser 1,219
9 pytest-benchmark 1,126
10 beir 1,095
11 smac 903
12 asv 799
13 Monocular-Depth-Estimation-Toolbox 769
14 pyperformance 747
15 py-frameworks-bench 693
16 benchmark 668
17 ADBench 646
18 mteb 601
19 tape 589
20 InternVideo 571
21 PDEBench 476
22 opencv_zoo 413
23 evalplus 351
Write Clean Python Code. Always.
Sonar helps you commit clean code every time. With over 225 unique rules to find Python bugs, code smells & vulnerabilities, Sonar finds the issues while you focus on the work.