Manage all types of time series data in a single, purpose-built database. Run at any scale in any environment in the cloud, on-premises, or at the edge. Learn more →
Top 23 Python Benchmark Projects
-
Project mention: Pre-Trained ML models for labeling retail images? Upload an image of a dress shirt and the labels output are “long sleeve, men’s, button down, collar, formal, dress shirt” or better? | /r/learnmachinelearning | 2023-04-25
-
Project mention: Is it better to not use the Target Update Frequency in Double DQN or depends on the application? | /r/reinforcementlearning | 2023-07-05
The tianshou implementation I found at https://github.com/thu-ml/tianshou/blob/master/tianshou/policy/modelfree/dqn.py is DQN by default.
-
InfluxDB
Collect and Analyze Billions of Data Points in Real Time. Manage all types of time series data in a single, purpose-built database. Run at any scale in any environment in the cloud, on-premises, or at the edge.
-
Project mention: How We Made PostgreSQL a Better Vector Database | news.ycombinator.com | 2023-09-25
(Blog author here). Thanks for the question. In this case the index for both DiskANN and pgvector HNSW is small enough to fit in memory on the machine (8GB RAM), so there's no need to touch the SSD. We plan to test on a config where the index size is larger than memory (we couldn't this time due to limitations in ANN benchmarks [0], the tool we use).
To your question about RAM usage, we provide a graph of index size. When enabling PQ, our new index is 10x smaller than pgvector HNSW. We don't have numbers for HNSWPQ in FAISS yet.
-
Project mention: RTMPose: The All-In-One Real-time Pose Estimation Solution for R&D | /r/artificial | 2023-03-19
RTMPose-m achieves 75.8% AP on COCO with 90+ FPS on an Intel i7-11700 CPU and 430+ FPS on an NVIDIA GTX 1660 Ti GPU, and RTMPose-l achieves 67.0% AP on COCO-WholeBody with 130+ FPS.
-
Mmaction2: https://github.com/open-mmlab/mmaction2 Has some examples.
-
-
-
Mergify
Updating dependencies is time-consuming.. Solutions like Dependabot or Renovate update but don't merge dependencies. You need to do it manually while it could be fully automated! Add a Merge Queue to your workflow and stop caring about PR management & merging. Try Mergify for free.
-
Project mention: A machine learning toolkit for log parsing [ICSE'19, DSN'16] | news.ycombinator.com | 2023-09-20
-
To test this, we will setup some benchmarks using pytest-benchmark, some sample data with a simple schema, and compare results between Python's dataclass, Pydantic v1, and v2.
-
beir
A Heterogeneous Benchmark for Information Retrieval. Easy to use, evaluate your models across 15+ diverse IR datasets.
Custom datasets can also be evaluated using this method as specified in this link. This article and the associated benchmarks script can be reused to evaluate what method works best on your data.
-
-
> All these workflows are a derivation of the source in the repository and keeping them close together has a great aesthetic.
I agree. Version control is a great enabler, so using it to track "sources" other than just code can be useful. A couple of tools I like to use:
- Artemis, for tracking issues http://www.chriswarbo.net/blog/2017-06-14-artemis.html
- ASV, for tracking benchmark results https://github.com/airspeed-velocity/asv (I use this for non-Python projects via my asv-nix plugin http://www.chriswarbo.net/projects/nixos/asv_benchmarking.ht... )
-
Project mention: With the absence of ultrasonics, how will Tesla measure depth when stationary without stereoscopic vision | /r/teslamotors | 2022-10-13
Pretty picture: https://github.com/zhyever/monocular-depth-estimation-toolbox
-
Project mention: Phoronix: PyPerformance benchmark is on average 32% faster on Python 3.11 compared to 3.10 (on a Ryzen 9 5950X) | /r/Python | 2022-10-26
PyPerformance benchmark: https://github.com/python/pyperformance
-
-
benchmark
TorchBench is a collection of open source benchmarks used to evaluate PyTorch performance. (by pytorch)
>What's a fair benchmark?
the absolute golden benchmarks are https://github.com/pytorch/benchmark
-
-
RAG is very difficult to do right. I am experimenting with various RAG projects from [1]. The main problems are:
- Chunking can interfer with context boundaries
- Content vectors can differ vastly from question vectors, for this you have to use hypothetical embeddings (they generate artificial questions and store them)
- Instead of saving just one embedding per text-chuck you should store various (text chunk, hypothetical embedding questions, meta data)
- RAG will miserably fail with requests like "summarize the whole document"
- to my knowledge, openAI embeddings aren't performing well, use a embedding that is optimized for question answering or information retrieval and supports multi language. Also look into instructor embeddings: https://github.com/embeddings-benchmark/mteb
1 https://github.com/underlines/awesome-marketing-datascience/...
-
tape
Tasks Assessing Protein Embeddings (TAPE), a set of five biologically relevant semi-supervised learning tasks spread across different domains of protein biology. (by songlab-cal)
-
InternVideo
InternVideo: General Video Foundation Models via Generative and Discriminative Learning (https://arxiv.org/abs/2212.03191)
Thanks for your interest! If you had any ideas to make the given demo more user-friendly, please do not hesitate to share them with us. We are open to discussing relevant ideas about video foundation models or other topics. We made some progress in these areas (InternVideo, VideoMAE v2, UMT, and more). We believe that user-level intelligent video understanding is on the horizon with the current LLM, computing power, and video data.
-
Project mention: [D] what are the SOTA neural PDE solvers besides FNO? | /r/MachineLearning | 2022-11-22
try https://github.com/pdebench/pdebench
-
Project mention: [R] Where to purchase legitimate models (already trained) and datasets? | /r/MachineLearning | 2023-03-02
-
Project mention: The AI Reproducibility Crisis in GPT-3.5/GPT-4 Research | news.ycombinator.com | 2023-08-25
*Further Reading*:
- [GPT-4's decline over time (HackerNews)](https://news.ycombinator.com/item?id=36786407)
- [GPT-4 downgrade discussions (OpenAI Forums)](https://community.openai.com/t/gpt-4-has-been-severely-downg...)
- [Behavioral changes in ChatGPT (arXiv)](https://arxiv.org/abs/2307.09009)
- [Zero-Shot Replication Effort (Github)](https://github.com/emrgnt-cmplxty/zero-shot-replication)
- [Inconsistencies in GPT-4 HumanEval (Github)](https://github.com/evalplus/evalplus/issues/15)
- [Early experiments with GPT-4 (arXiv)](https://arxiv.org/abs/2303.12712)
- [GPT-4 Technical Report (arXiv)](https://arxiv.org/abs/2303.08774)
-
Sonar
Write Clean Python Code. Always.. Sonar helps you commit clean code every time. With over 225 unique rules to find Python bugs, code smells & vulnerabilities, Sonar finds the issues while you focus on the work.
Python Benchmark related posts
- How We Made PostgreSQL a Better Vector Database
- Why my favourite API is a zipfile on the European Central Bank's website
- A LLM+OLAP Solution
- Vector Search with OpenAI Embeddings: Lucene Is All You Need
- The AI Reproducibility Crisis in GPT-3.5/GPT-4 Research
- Benefits of hybrid search
- Vector Dataset benchmark with 1536/768 dim data
-
A note from our sponsor - InfluxDB
www.influxdata.com | 29 Sep 2023
Index
What are some of the best open-source Benchmark projects in Python? This list will help you:
Project | Stars | |
---|---|---|
1 | fashion-mnist | 11,032 |
2 | tianshou | 6,698 |
3 | ann-benchmarks | 4,051 |
4 | mmpose | 4,050 |
5 | mmaction2 | 3,451 |
6 | Baichuan-13B | 2,645 |
7 | py-motmetrics | 1,250 |
8 | logparser | 1,219 |
9 | pytest-benchmark | 1,126 |
10 | beir | 1,095 |
11 | smac | 903 |
12 | asv | 799 |
13 | Monocular-Depth-Estimation-Toolbox | 769 |
14 | pyperformance | 747 |
15 | py-frameworks-bench | 693 |
16 | benchmark | 668 |
17 | ADBench | 646 |
18 | mteb | 601 |
19 | tape | 589 |
20 | InternVideo | 571 |
21 | PDEBench | 476 |
22 | opencv_zoo | 413 |
23 | evalplus | 351 |