Python Inference

Open-source Python projects categorized as Inference

Top 23 Python Inference Projects

  • ColossalAI

    Making large AI models cheaper, faster and more accessible

    Project mention: ColossalChat: An Open-Source Solution for Cloning ChatGPT with a RLHF Pipeline | | 2023-04-04

    > open-source a complete RLHF pipeline ... based on the LLaMA pre-trained model

    I've gotten to where when I see "open source AI" I now know it's "well, except for $some_other_dependencies"

    Anyway: and (Apache 2) can save you some heartache at least

  • DeepSpeed

    DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

    Project mention: April 2023 | /r/dailyainews | 2023-06-02

    DeepSpeed Chat: Easy, Fast and Affordable RLHF Training of ChatGPT-like Models at All Scales (

  • InfluxDB

    Access the most powerful time series database as a service. Ingest, store, & analyze all types of time series data in a fully-managed, purpose-built database. Keep data forever with low-cost storage and superior data compression.

  • nebuly

    The next-generation platform to monitor and optimize your AI costs in one place 🚀

    Project mention: What are you building with LLMs? I'm writing an article about what people are building with LLMs | /r/programming | 2023-03-27

    Hi everyone. I’m the creator of ChatLLaMA, an opensource framework to train LLMs with limited resources and create There’s been amazing usage of LLMs in these days, from chatbots to retrieve about company’s product information, to cooking assistants for traditional dishes, and much more. And you? What you building or would love to build with LLMs? Let me know and I’ll share the article about your stories soon. Cheers

  • server

    The Triton Inference Server provides an optimized cloud and edge inferencing solution. (by triton-inference-server)

    Project mention: Single RTX 3080 or two RTX 3060s for deep learning inference? | /r/computervision | 2023-04-12

    For inference of CNNs, memory should really not be an issue. If it is a software engineering problem, not a hardware issue. FP16 or Int8 for weights is fine and weight size won’t increase due to the high resolution. And during inference memory used for hidden layer tensors can be reused as soon as the last consumer layer has been processed. You likely using something that is designed for training for inference and that blows up the memory requirement, or if you are using TensorRT or something like that, you need to be careful to avoid that every tasks loads their own copy of the library code into the GPU. Maybe look at

  • torch2trt

    An easy to use PyTorch to TensorRT converter

  • adversarial-robustness-toolbox

    Adversarial Robustness Toolbox (ART) - Python Library for Machine Learning Security - Evasion, Poisoning, Extraction, Inference - Red and Blue Teams

    Project mention: [D] Couldn't devs of major GPTs have added an invisible but detectable watermark in the models? | /r/MachineLearning | 2023-01-22
  • faster-whisper

    Faster Whisper transcription with CTranslate2

    Project mention: Does openai whisper works on termux ? | /r/termux | 2023-05-26

    Since then I figured out live transcription and also how to get faster-whisper running. I still need to write things down in detail at some point, though.

  • Sonar

    Write Clean Python Code. Always.. Sonar helps you commit clean code every time. With over 225 unique rules to find Python bugs, code smells & vulnerabilities, Sonar finds the issues while you focus on the work.

  • text-generation-inference

    Large Language Model Text Generation Inference

    Project mention: Falcon 40B LLM which beats Llama license changed to Apache 2.0 | | 2023-05-31

    For fast inference, the HuggingFace cofounder, Thom Wolf recommends their text-generation-interface library

  • deepsparse

    Inference runtime offering GPU-class performance on CPUs and APIs to integrate ML into your application

    Project mention: [D] How to get the fastest PyTorch inference and what is the "best" model serving framework? | /r/MachineLearning | 2022-10-28

    For 1), what is the easiest way to speed up inference (assume only PyTorch and primarily GPU but also some CPU)? I have been using ONNX and Torchscript but there is a bit of a learning curve and sometimes it can be tricky to get the model to actually work. Is there anything else worth trying? I am enthused by things like TorchDynamo (although I have not tested it extensively) due to its apparent ease of use. I also saw the post yesterday about Kernl using (OpenAI) Triton kernels to speed up transformer models which also looks interesting. Are things like SageMaker Neo or NeuralMagic worth trying? My only reservation with some of these is they still seem to be pretty model/architecture specific. I am a little reluctant to put much time into these unless I know others have had some success first.

  • transformer-deploy

    Efficient, scalable and enterprise-grade CPU/GPU inference server for 🤗 Hugging Face transformer models 🚀

    Project mention: [D] How to get the fastest PyTorch inference and what is the "best" model serving framework? | /r/MachineLearning | 2022-10-28

    For 2), I am aware of a few options. Triton inference server is an obvious one as is the ‘transformer-deploy’ version from LDS. My only reservation here is that they require the model compilation or are architecture specific. I am aware of others like Bento, Ray serving and TorchServe. Ideally I would have something that allows any (PyTorch model) to be used without the extra compilation effort (or at least optionally) and has some convenience things like ease of use, easy to deploy, easy to host multiple models and can perform some dynamic batching. Anyway, I am really interested to hear people's experience here as I know there are now quite a few options! Any help is appreciated! Disclaimer - I have no affiliation or are connected in any way with the libraries or companies listed here. These are just the ones I know of. Thanks in advance.

  • budgetml

    Deploy a ML inference service on a budget in less than 10 lines of code.

  • optimum

    🚀 Accelerate training and inference of 🤗 Transformers and 🤗 Diffusers with easy to use hardware optimization tools

    Project mention: [D] Is ML doomed to end up closed-source? | /r/MachineLearning | 2023-03-21

    Optimum to accelerate inference of transformers with hardware optimization



  • DeepSpeed-MII

    MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.

    Project mention: Stable Diffusion plus DeepSpeed | /r/StableDiffusion | 2023-04-12

    Small, dependency-free, fast Python package to infer binary file types checking the magic numbers signature

  • pinferencia

    Python + Inference - Model Deployment library in Python. Simplest model inference server ever.

    Project mention: Show HN: Pinferencia, Deploy Your AI Models with Pretty UI and REST API | | 2022-07-04
  • AutoGPTQ

    An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.

    Project mention: Introducing Basaran: self-hosted open-source alternative to the OpenAI text completion API | /r/LocalLLaMA | 2023-06-01

    Instead of integrating GPTQ-for-Lllama, use AutoGPTQ instead.

  • hidet

    An open-source efficient deep learning framework/compiler, written in python.

    Project mention: Hidet: A Deep Learning Compiler for Efficient Model Serving | | 2023-04-28

    Hey @bructhemoose2 can you file an issue, we will try to fix it ASAP:

  • fastT5

    ⚡ boost inference speed of T5 models by 5x & reduce the model size by 3x.

    Project mention: Speeding up T5 | /r/LanguageTechnology | 2023-01-22

    I've tried but it works with CPU only.

  • sparktorch

    Train and run Pytorch models on Apache Spark.

  • model_analyzer

    Triton Model Analyzer is a CLI tool to help with better understanding of the compute and memory requirements of the Triton Inference Server models.

    Project mention: [P] Benchmarking some PyTorch Inference Servers | /r/MachineLearning | 2023-01-22
  • emlearn

    Machine Learning inference engine for Microcontrollers and Embedded devices

    Project mention: EleutherAI announces it has become a non-profit | | 2023-03-02

    > My big gripe, and for obvious reasons, is that we need to step away from cloud-based inference, and it doesn't seem like anyone's working on that.

    I think there are steps being taken in this direction (check out [1] and [2] for interesting lightweight transpile / ad-hoc training projects) but there is a lack of centralized community for these constrained problems.


  • graphsignal-python

    Graphsignal Python tracer

    Project mention: Show HN: Python Monitoring for LLMs, OpenAI, Inference, GPUs | | 2023-04-04

    We've built it for apps that use LLMs and other ML models. The lightweight Python agent autoinstruments OpenAI, LangChain, Banana, and other APIs and frameworks. Basically by adding one line of code you'll be able to monitor and analyze latency, errors, compute and costs. Profiling using CProfile, PyTorch Kineto or Yappi can be enabled if code-level statistics are necessary.

    Here is a short demo screencast for a LangChain/OpenAI app:

    In terms of data privacy, we only send metadata and statistics to So no raw data, such as prompts or images leave your app.

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2023-06-02.

Python Inference related posts


What are some of the best open-source Inference projects in Python? This list will help you:

Project Stars
1 ColossalAI 30,025
2 DeepSpeed 25,390
3 nebuly 8,152
4 server 5,418
5 torch2trt 3,967
6 adversarial-robustness-toolbox 3,711
7 faster-whisper 2,237
8 text-generation-inference 1,592
9 deepsparse 1,491
10 transformer-deploy 1,399
11 budgetml 1,312
12 optimum 1,200
13 BERT-NER 1,123
14 DeepSpeed-MII 740
15 526
16 pinferencia 525
17 AutoGPTQ 517
18 hidet 438
19 fastT5 436
20 sparktorch 297
21 model_analyzer 258
22 emlearn 256
23 graphsignal-python 171
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives