SaaSHub helps you find the best software and product alternatives Learn more →
Top 23 Python Inference Projects
-
DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
server
The Triton Inference Server provides an optimized cloud and edge inferencing solution. (by triton-inference-server)
-
adversarial-robustness-toolbox
Adversarial Robustness Toolbox (ART) - Python Library for Machine Learning Security - Evasion, Poisoning, Extraction, Inference - Red and Blue Teams
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
optimum
🚀 Accelerate training and inference of 🤗 Transformers and 🤗 Diffusers with easy to use hardware optimization tools
-
transformer-deploy
Efficient, scalable and enterprise-grade CPU/GPU inference server for 🤗 Hugging Face transformer models 🚀
-
uform
Pocket-Sized Multimodal AI for content understanding and generation across multilingual texts, images, and 🔜 video, up to 5x faster than OpenAI CLIP and LLaVA 🖼️ & 🖋️
-
GenossGPT
One API for all LLMs either Private or Public (Anthropic, Llama V2, GPT 3.5/4, Vertex, GPT4ALL, HuggingFace ...) 🌈🐂 Replace OpenAI GPT with any LLMs in your app with one line.
-
filetype.py
Small, dependency-free, fast Python package to infer binary file types checking the magic numbers signature
-
pinferencia
Python + Inference - Model Deployment library in Python. Simplest model inference server ever.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Project mention: Can we discuss MLOps, Deployment, Optimizations, and Speed? | /r/LocalLLaMA | 2023-12-06DeepSpeed can handle parallelism concerns, and even offload data/model to RAM, or even NVMe (!?) . I'm surprised I don't see this project used more.
Project mention: AI leaderboards are no longer useful. It's time to switch to Pareto curves | news.ycombinator.com | 2024-04-30I guess the root cause of my claim is that OpenAI won't tell us whether or not GPT-3.5 is an MoE model, and I assumed it wasn't. Since GPT-3.5 is clearly nondeterministic at temp=0, I believed the nondeterminism was due to FPU stuff, and this effect was amplified with GPT-4's MoE. But if GPT-3.5 is also MoE then that's just wrong.
What makes this especially tricky is that small models are truly 100% deterministic at temp=0 because the relative likelihoods are too coarse for FPU issues to be a factor. I had thought 3.5 was big enough that some of its token probabilities were too fine-grained for the FPU. But that's probably wrong.
On the other hand, it's not just GPT, there are currently floating-point difficulties in vllm which significantly affect the determinism of any model run on it: https://github.com/vllm-project/vllm/issues/966 Note that a suggested fix is upcasting to float32. So it's possible that GPT-3.5 is using an especially low-precision float and introducing nondeterminism by saving money on compute costs.
Sadly I do not have the money[1] to actually run a test to falsify any of this. It seems like this would be a good little research project.
[1] Or the time, or the motivation :) But this stuff is expensive.
Project mention: Creando Subtítulos Automáticos para Vídeos con Python, Faster-Whisper, FFmpeg, Streamlit, Pillow | dev.to | 2024-04-29Faster-whisper (https://github.com/SYSTRAN/faster-whisper)
Project mention: Fast Llama 2 on CPUs with Sparse Fine-Tuning and DeepSparse | news.ycombinator.com | 2023-11-23Interesting company. Yannic Kilcher interviewed Nir Shavit last year and they went into some depth: https://www.youtube.com/watch?v=0PAiQ1jTN5k DeepSparse is on GitHub: https://github.com/neuralmagic/deepsparse
Project mention: FastEmbed: Fast and Lightweight Embedding Generation for Text | dev.to | 2024-02-02Shout out to Huggingface's Optimum – which made it easier to quantize models.
Project mention: CatLIP: Clip Vision Accuracy with 2.7x Faster Pre-Training on Web-Scale Data | news.ycombinator.com | 2024-04-25question: any good on-device size image embedding models?
tried https://github.com/unum-cloud/uform which i do like, especially they also support languages other than English. Any recommendations on other alternatives?
Project mention: Drop-in replacement for the OpenAI API based on open source LLMs | news.ycombinator.com | 2024-01-17
Check out Hidet [1]. Not as well funded, but delivers Python based ML acceleration with GPU support (unlike Mojo).
[1] https://github.com/hidet-org/hidet
Python Inference related posts
-
Creando Subtítulos Automáticos para Vídeos con Python, Faster-Whisper, FFmpeg, Streamlit, Pillow
-
CatLIP: Clip Vision Accuracy with 2.7x Faster Pre-Training on Web-Scale Data
-
Multimodal Embeddings for JavaScript, Swift, and Python
-
FLaNK AI-April 22, 2024
-
Hugging Face reverts the license back to Apache 2.0
-
Apple Explores Home Robotics as Potential 'Next Big Thing'
-
Using Groq to Build a Real-Time Language Translation App
-
A note from our sponsor - SaaSHub
www.saashub.com | 10 May 2024
Index
What are some of the best open-source Inference projects in Python? This list will help you:
Project | Stars | |
---|---|---|
1 | ColossalAI | 37,951 |
2 | DeepSpeed | 32,834 |
3 | vllm | 18,931 |
4 | faster-whisper | 9,014 |
5 | text-generation-inference | 7,938 |
6 | server | 7,384 |
7 | adversarial-robustness-toolbox | 4,483 |
8 | torch2trt | 4,403 |
9 | open_model_zoo | 3,957 |
10 | AutoGPTQ | 3,806 |
11 | deepsparse | 2,881 |
12 | optimum | 2,174 |
13 | DeepSpeed-MII | 1,662 |
14 | transformer-deploy | 1,622 |
15 | budgetml | 1,333 |
16 | BERT-NER | 1,182 |
17 | uform | 894 |
18 | GenossGPT | 738 |
19 | hidet | 615 |
20 | filetype.py | 610 |
21 | pinferencia | 558 |
22 | fastT5 | 540 |
23 | emlearn | 424 |
Sponsored