Top 23 Python Inference Projects

ColossalAI

42 37,951 9.7 Python

Making large AI models cheaper, faster and more accessible

Project mention: FLaNK AI-April 22, 2024 | dev.to | 2024-04-22

DeepSpeed

51 32,834 9.8 Python

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Project mention: Can we discuss MLOps, Deployment, Optimizations, and Speed? | /r/LocalLLaMA | 2023-12-06

DeepSpeed can handle parallelism concerns, and even offload data/model to RAM, or even NVMe (!?) . I'm surprised I don't see this project used more.

InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
vllm

31 18,931 9.9 Python

A high-throughput and memory-efficient inference and serving engine for LLMs

Project mention: AI leaderboards are no longer useful. It's time to switch to Pareto curves | news.ycombinator.com | 2024-04-30

I guess the root cause of my claim is that OpenAI won't tell us whether or not GPT-3.5 is an MoE model, and I assumed it wasn't. Since GPT-3.5 is clearly nondeterministic at temp=0, I believed the nondeterminism was due to FPU stuff, and this effect was amplified with GPT-4's MoE. But if GPT-3.5 is also MoE then that's just wrong.
What makes this especially tricky is that small models are truly 100% deterministic at temp=0 because the relative likelihoods are too coarse for FPU issues to be a factor. I had thought 3.5 was big enough that some of its token probabilities were too fine-grained for the FPU. But that's probably wrong.
On the other hand, it's not just GPT, there are currently floating-point difficulties in vllm which significantly affect the determinism of any model run on it: https://github.com/vllm-project/vllm/issues/966 Note that a suggested fix is upcasting to float32. So it's possible that GPT-3.5 is using an especially low-precision float and introducing nondeterminism by saving money on compute costs.
Sadly I do not have the money[1] to actually run a test to falsify any of this. It seems like this would be a good little research project.
[1] Or the time, or the motivation :) But this stuff is expensive.

faster-whisper

23 9,014 8.1 Python

Faster Whisper transcription with CTranslate2

Project mention: Creando Subtítulos Automáticos para Vídeos con Python, Faster-Whisper, FFmpeg, Streamlit, Pillow | dev.to | 2024-04-29

Faster-whisper (https://github.com/SYSTRAN/faster-whisper)

text-generation-inference

29 7,938 9.6 Python

Large Language Model Text Generation Inference

Project mention: FLaNK AI-April 22, 2024 | dev.to | 2024-04-22

server

24 7,384 9.5 Python

The Triton Inference Server provides an optimized cloud and edge inferencing solution. (by triton-inference-server)

Project mention: FLaNK Weekly 08 Jan 2024 | dev.to | 2024-01-08

adversarial-robustness-toolbox

8 4,483 9.7 Python

Adversarial Robustness Toolbox (ART) - Python Library for Machine Learning Security - Evasion, Poisoning, Extraction, Inference - Red and Blue Teams
SaaSHub

www.saashub.com featured

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
torch2trt

5 4,403 7.6 Python

An easy to use PyTorch to TensorRT converter
open_model_zoo

5 3,957 8.6 Python

Pre-trained Deep Learning models and demos (high quality and extremely fast)

Project mention: FLaNK Stack Weekly 06 Nov 2023 | dev.to | 2023-11-06

AutoGPTQ

19 3,806 9.3 Python

An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.

Project mention: Setting up LLAMA2 70B Chat locally | /r/developersIndia | 2023-08-18

deepsparse

21 2,881 9.5 Python

Sparsity-aware deep learning inference runtime for CPUs

Project mention: Fast Llama 2 on CPUs with Sparse Fine-Tuning and DeepSparse | news.ycombinator.com | 2023-11-23

Interesting company. Yannic Kilcher interviewed Nir Shavit last year and they went into some depth: https://www.youtube.com/watch?v=0PAiQ1jTN5k DeepSparse is on GitHub: https://github.com/neuralmagic/deepsparse

optimum

8 2,174 9.5 Python

🚀 Accelerate training and inference of 🤗 Transformers and 🤗 Diffusers with easy to use hardware optimization tools

Project mention: FastEmbed: Fast and Lightweight Embedding Generation for Text | dev.to | 2024-02-02

Shout out to Huggingface's Optimum – which made it easier to quantize models.

DeepSpeed-MII

6 1,662 8.6 Python

MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.
transformer-deploy

8 1,622 6.8 Python

Efficient, scalable and enterprise-grade CPU/GPU inference server for 🤗 Hugging Face transformer models 🚀
budgetml

4 1,333 0.0 Python

Deploy a ML inference service on a budget in less than 10 lines of code.
BERT-NER

1 1,182 0.0 Python

Pytorch-Named-Entity-Recognition-with-BERT
uform

8 894 9.2 Python

Pocket-Sized Multimodal AI for content understanding and generation across multilingual texts, images, and 🔜 video, up to 5x faster than OpenAI CLIP and LLaVA 🖼️ & 🖋️

Project mention: CatLIP: Clip Vision Accuracy with 2.7x Faster Pre-Training on Web-Scale Data | news.ycombinator.com | 2024-04-25

question: any good on-device size image embedding models?
tried https://github.com/unum-cloud/uform which i do like, especially they also support languages other than English. Any recommendations on other alternatives?

GenossGPT

1 738 8.7 Python

One API for all LLMs either Private or Public (Anthropic, Llama V2, GPT 3.5/4, Vertex, GPT4ALL, HuggingFace ...) 🌈🐂 Replace OpenAI GPT with any LLMs in your app with one line.

Project mention: Drop-in replacement for the OpenAI API based on open source LLMs | news.ycombinator.com | 2024-01-17

hidet

3 615 8.8 Python

An open-source efficient deep learning framework/compiler, written in python.

Project mention: karpathy/llm.c | news.ycombinator.com | 2024-04-08

Check out Hidet [1]. Not as well funded, but delivers Python based ML acceleration with GPU support (unlike Mojo).
[1] https://github.com/hidet-org/hidet

filetype.py

1 610 4.2 Python

Small, dependency-free, fast Python package to infer binary file types checking the magic numbers signature
pinferencia

21 558 0.0 Python

Python + Inference - Model Deployment library in Python. Simplest model inference server ever.
fastT5

5 540 0.0 Python

⚡ boost inference speed of T5 models by 5x & reduce the model size by 3x.
emlearn

5 424 9.2 Python

Machine Learning inference engine for Microcontrollers and Embedded devices
SaaSHub

www.saashub.com featured

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python Inference related posts

Creando Subtítulos Automáticos para Vídeos con Python, Faster-Whisper, FFmpeg, Streamlit, Pillow

7 projects | dev.to | 29 Apr 2024
CatLIP: Clip Vision Accuracy with 2.7x Faster Pre-Training on Web-Scale Data

1 project | news.ycombinator.com | 25 Apr 2024
Multimodal Embeddings for JavaScript, Swift, and Python

1 project | news.ycombinator.com | 25 Apr 2024
FLaNK AI-April 22, 2024

28 projects | dev.to | 22 Apr 2024
Hugging Face reverts the license back to Apache 2.0

1 project | news.ycombinator.com | 8 Apr 2024
Apple Explores Home Robotics as Potential 'Next Big Thing'

3 projects | news.ycombinator.com | 4 Apr 2024
Using Groq to Build a Real-Time Language Translation App

3 projects | dev.to | 5 Apr 2024
A note from our sponsor - SaaSHub
www.saashub.com | 10 May 2024

SaaSHub helps you find the best software and product alternatives Learn more →

Index

What are some of the best open-source Inference projects in Python? This list will help you:

	Project	Stars
1	ColossalAI	37,951
2	DeepSpeed	32,834
3	vllm	18,931
4	faster-whisper	9,014
5	text-generation-inference	7,938
6	server	7,384
7	adversarial-robustness-toolbox	4,483
8	torch2trt	4,403
9	open_model_zoo	3,957
10	AutoGPTQ	3,806
11	deepsparse	2,881
12	optimum	2,174
13	DeepSpeed-MII	1,662
14	transformer-deploy	1,622
15	budgetml	1,333
16	BERT-NER	1,182
17	uform	894
18	GenossGPT	738
19	hidet	615
20	filetype.py	610
21	pinferencia	558
22	fastT5	540
23	emlearn	424

Python Inference

Top 23 Python Inference Projects

Python Inference related posts

Creando Subtítulos Automáticos para Vídeos con Python, Faster-Whisper, FFmpeg, Streamlit, Pillow

CatLIP: Clip Vision Accuracy with 2.7x Faster Pre-Training on Web-Scale Data

Multimodal Embeddings for JavaScript, Swift, and Python

FLaNK AI-April 22, 2024

Hugging Face reverts the license back to Apache 2.0

Apple Explores Home Robotics as Potential 'Next Big Thing'

Using Groq to Build a Real-Time Language Translation App

Index