Python Benchmark

Open-source Python projects categorized as Benchmark

Top 23 Python Benchmark Projects

  • fashion-mnist

    A MNIST-like fashion product database. Benchmark :point_down:

    Project mention: Logistic Regression for Image Classification Using OpenCV | news.ycombinator.com | 2023-12-31

    In this case there's no advantage to using logistic regression on an image other than the novelty. Logistic regression is excellent for feature explainability, but you can't explain anything from an image.

    Traditional classification algorithms but not deep learning such as SVMs and Random Forest perform a lot better on MNIST, up to 97% accuracy compared to the 88% from logistic regression in this post. Check the Original MNIST benchmarks here: http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/#

  • Scout Monitoring

    Free Django app performance insights with Scout Monitoring. Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.

    Scout Monitoring logo
  • mmpose

    OpenMMLab Pose Estimation Toolbox and Benchmark.

  • ann-benchmarks

    Benchmarks of approximate nearest neighbor libraries in Python

    Project mention: Using Your Vector Database as a JSON (Or Relational) Datastore | news.ycombinator.com | 2024-04-23

    On top of my head, pgvector only supports 2 indexes, those are running in memory only. They don't support GPU indexing, nor Disk based indexing, they also don't have separation of query and insertions.

    Also with different people I've talked to, they struggle with scale past 100K-1M vector.

    You can also have a look yourself from a performance perspective: https://ann-benchmarks.com/

  • mmaction2

    OpenMMLab's Next Generation Video Understanding Toolbox and Benchmark

  • Baichuan2

    A series of large language models developed by Baichuan Intelligent Technology

    Project mention: Baichuan 2 | news.ycombinator.com | 2023-10-12
  • opencompass

    OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.

    Project mention: Show HN: Times faster LLM evaluation with Bayesian optimization | news.ycombinator.com | 2024-02-13

    Fair question.

    Evaluate refers to the phase after training to check if the training is good.

    Usually the flow goes training -> evaluation -> deployment (what you called inference). This project is aimed for evaluation. Evaluation can be slow (might even be slower than training if you're finetuning on a small domain specific subset)!

    So there are [quite](https://github.com/microsoft/promptbench) [a](https://github.com/confident-ai/deepeval) [few](https://github.com/openai/evals) [frameworks](https://github.com/EleutherAI/lm-evaluation-harness) working on evaluation, however, all of them are quite slow, because LLM are slow if you don't have infinite money. [This](https://github.com/open-compass/opencompass) one tries to speed up by parallelizing on multiple computers, but none of them takes advantage of the fact that many evaluation queries might be similar and all try to evaluate on all given queries. And that's where this project might come in handy.

  • Baichuan-13B

    A 13B large language model developed by Baichuan Intelligent Technology

  • InfluxDB

    Purpose built for real-time analytics at any scale. InfluxDB Platform is powered by columnar analytics, optimized for cost-efficient storage, and built with open data standards.

    InfluxDB logo
  • promptbench

    A unified evaluation framework for large language models

    Project mention: Show HN: Times faster LLM evaluation with Bayesian optimization | news.ycombinator.com | 2024-02-13

    Fair question.

    Evaluate refers to the phase after training to check if the training is good.

    Usually the flow goes training -> evaluation -> deployment (what you called inference). This project is aimed for evaluation. Evaluation can be slow (might even be slower than training if you're finetuning on a small domain specific subset)!

    So there are [quite](https://github.com/microsoft/promptbench) [a](https://github.com/confident-ai/deepeval) [few](https://github.com/openai/evals) [frameworks](https://github.com/EleutherAI/lm-evaluation-harness) working on evaluation, however, all of them are quite slow, because LLM are slow if you don't have infinite money. [This](https://github.com/open-compass/opencompass) one tries to speed up by parallelizing on multiple computers, but none of them takes advantage of the fact that many evaluation queries might be similar and all try to evaluate on all given queries. And that's where this project might come in handy.

  • logparser

    A machine learning toolkit for log parsing [ICSE'19, DSN'16]

    Project mention: Log2row: A tool that detects, extracts templates, and structures logs | news.ycombinator.com | 2023-10-06

    You use GPT-4 to extract log patterns, does it really need LLM? There are more traditional approach such as https://github.com/logpai/logparser

  • beir

    A Heterogeneous Benchmark for Information Retrieval. Easy to use, evaluate your models across 15+ diverse IR datasets.

    Project mention: Any* Embedding Model Can Become a Late Interaction Model - If You Give It a Chance! | dev.to | 2024-08-29

    The source code for these experiments is open-source and utilizes beir-qdrant, an integration of Qdrant with the BeIR library. While this package is not officially maintained by the Qdrant team, it may prove useful for those interested in experimenting with various Qdrant configurations to see how they impact retrieval quality. All experiments were conducted using Qdrant in exact search mode, ensuring the results are not influenced by approximate search.

  • py-motmetrics

    :bar_chart: Benchmark multiple object trackers (MOT) in Python

  • InternVideo

    [ECCV2024] Video Foundation Models & Data for Multimodal Understanding

  • pytest-benchmark

    py.test fixture for benchmarking code

    Project mention: Pinpoint performance regressions with CI-Integrated differential profiling | dev.to | 2023-10-23

    pytest-benchmark

  • evalplus

    Rigourous evaluation of LLM-synthesized code - NeurIPS 2023

  • smac

    SMAC: The StarCraft Multi-Agent Challenge

  • Monocular-Depth-Estimation-Toolbox

    Monocular Depth Estimation Toolbox based on MMSegmentation.

  • asv

    Airspeed Velocity: A simple Python benchmarking tool with web-based reporting

  • pyperformance

    Python Performance Benchmark Suite

    Project mention: Python Performance Benchmark Suite | news.ycombinator.com | 2024-08-31
  • benchmark

    TorchBench is a collection of open source benchmarks used to evaluate PyTorch performance. (by pytorch)

    Project mention: PyTorch is dead. Long live Jax | news.ycombinator.com | 2024-08-17

    If you're the author, unfortunately I have to say that the blog is not well-written -- misinformed about some of the claims and has a repugnant click-baity title. you're getting the attention and clicks, but probably losing a lot of trust among people. I didn't engage out of choice, but because of a duty to respond to FUD.

    > > torch.compile is 2 years old, XLA is 7 years old. Compilers take a few years to mature

    > That was one of my major points - I don't think leaning on torch.compile is the best idea. A compiler would inherently place restrictions that you have to work-around.

    There are plenty of compilers that place restrictions that you barely notice. gcc, clang, nvcc -- they're fairly flexible, and "dynamic". Adding constraints doesn't mean you have to give up on important flexibility.

    > This is not dynamic, nor flexible - and it flies in the face of torch's core philosophies just so they can offer more performance to the big labs using PyTorch. For various reasons, I dislike pandering to the rich guy instead of being an independent, open-source entity.

    I think this is an assumption you've made largely without evidence. I'm not entirely sure what your point is. The way torch.compile is measured for success publicly (even in the announcement blogpost and Conference Keynote, link https://pytorch.org/get-started/pytorch-2.0/ ) is by measuring on a bunch of popular PyTorch-based github repos in the wild + popular HuggingFace models + the TIMM vision benchmark. They're curated here https://github.com/pytorch/benchmark . Your claim that its to mainly favor large labs is pretty puzzling.

    torch.compile is both dynamic and flexible because: 1. it supports dynamic shapes, 2. it allows incremental compilation (you dont need to compile the parts that you wish to keep in uncompilable python -- probably using random arbitrary python packages, etc.). there is a trade-off between dynamic, flexible and performance, i.e. more dynamic and flexible means we don't have enough information to extract better performance, but that's an acceptable trade-off when you need the flexibility to express your ideas more than you need the speed.

    > XLA's GPU support is great, its compatible across different hardware, its optimized and mature. In short, its a great alternative to the often buggy torch.compile stack - if you fix the torch integration.

    If you are an XLA maximalist, that's fine. I am not. There isn't evidence to prove out either opinions. PyTorch will never be nicely compatible with XLA until XLA has significant constraints that are incompatible with PyTorch's User Experience model. The PyTorch devs have given clear written-down feedback to the XLA project on what it takes for XLA+PyTorch to get better, and its been a few years and the XLA project prioritizes other things.

  • ADBench

    Official Implement of "ADBench: Anomaly Detection Benchmark", NeurIPS 2022.

  • PDEBench

    PDEBench: An Extensive Benchmark for Scientific Machine Learning

    Project mention: [P] LagrangeBench: A Lagrangian Fluid Mechanics Benchmarking Suite | /r/MachineLearning | 2023-12-11

    LagrangeBench is a machine learning benchmarking library for CFD particle problems based on JAX. It is designed to evaluate and develop learned particle models (e.g. graph neural networks) on challenging physical problems. To our knowledge it's the first benchmark for this specific set of problems. Our work was inspired by the grid-based benchmarks of PDEBench and PDEArena, and we propose it as a Lagrangian alternative.

  • py-frameworks-bench

    Another benchmark for some python frameworks

  • tape

    Tasks Assessing Protein Embeddings (TAPE), a set of five biologically relevant semi-supervised learning tasks spread across different domains of protein biology. (by songlab-cal)

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python Benchmark discussion

Log in or Post with

Python Benchmark related posts

  • Python Performance Benchmark Suite

    1 project | news.ycombinator.com | 31 Aug 2024
  • Any* Embedding Model Can Become a Late Interaction Model - If You Give It a Chance!

    2 projects | dev.to | 29 Aug 2024
  • PyTorch is dead. Long live Jax

    2 projects | news.ycombinator.com | 17 Aug 2024
  • Show HN: Open-source LLM provider price comparison

    2 projects | news.ycombinator.com | 14 Aug 2024
  • Show HN: PyBench 2.0 – Python benchmark tool inspired by Geekbench

    1 project | news.ycombinator.com | 22 May 2024
  • Using Your Vector Database as a JSON (Or Relational) Datastore

    1 project | news.ycombinator.com | 23 Apr 2024
  • PullRequestBenchmark Challenge: Can AI Replace Your Dev Team?

    1 project | news.ycombinator.com | 10 Apr 2024
  • A note from our sponsor - Scout Monitoring
    www.scoutapm.com | 7 Sep 2024
    Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today. Learn more →

Index

What are some of the best open-source Benchmark projects in Python? This list will help you:

Project Stars
1 fashion-mnist 11,619
2 mmpose 5,547
3 ann-benchmarks 4,840
4 mmaction2 4,134
5 Baichuan2 4,072
6 opencompass 3,669
7 Baichuan-13B 2,980
8 promptbench 2,347
9 logparser 1,549
10 beir 1,543
11 py-motmetrics 1,368
12 InternVideo 1,275
13 pytest-benchmark 1,232
14 evalplus 1,133
15 smac 1,066
16 Monocular-Depth-Estimation-Toolbox 897
17 asv 860
18 pyperformance 850
19 benchmark 843
20 ADBench 824
21 PDEBench 714
22 py-frameworks-bench 710
23 tape 635

Sponsored
Free Django app performance insights with Scout Monitoring
Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.
www.scoutapm.com