Python Inference

Open-source Python projects categorized as Inference

Top 23 Python Inference Projects

  • ColossalAI

    Making large AI models cheaper, faster and more accessible

  • Project mention: FLaNK AI-April 22, 2024 | dev.to | 2024-04-22
  • Scout Monitoring

    Free Django app performance insights with Scout Monitoring. Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.

    Scout Monitoring logo
  • DeepSpeed

    DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

  • Project mention: Can we discuss MLOps, Deployment, Optimizations, and Speed? | /r/LocalLLaMA | 2023-12-06

    DeepSpeed can handle parallelism concerns, and even offload data/model to RAM, or even NVMe (!?) . I'm surprised I don't see this project used more.

  • vllm

    A high-throughput and memory-efficient inference and serving engine for LLMs

  • Project mention: Best LLM Inference Engines and Servers to Deploy LLMs in Production | dev.to | 2024-06-05

    GitHub repository: https://github.com/vllm-project/vllm

  • faster-whisper

    Faster Whisper transcription with CTranslate2

  • Project mention: Self-hosted offline transcription and diarization service with LLM summary | news.ycombinator.com | 2024-05-26

    I've been using this:

    https://github.com/bugbakery/transcribee

    It's noticeably work-in-progress but it does the job and has a nice UI to edit transcriptions and speakers etc.

    It's running on the CPU for me, would be nice to have something that can make use of a 4GB Nvidia GPU, which faster-whisper is actually able to [1]

    https://github.com/SYSTRAN/faster-whisper?tab=readme-ov-file...

  • text-generation-inference

    Large Language Model Text Generation Inference

  • Project mention: Best LLM Inference Engines and Servers to Deploy LLMs in Production | dev.to | 2024-06-05

    GitHub repository: https://github.com/huggingface/text-generation-inference

  • server

    The Triton Inference Server provides an optimized cloud and edge inferencing solution. (by triton-inference-server)

  • Project mention: Best LLM Inference Engines and Servers to Deploy LLMs in Production | dev.to | 2024-06-05
  • adversarial-robustness-toolbox

    Adversarial Robustness Toolbox (ART) - Python Library for Machine Learning Security - Evasion, Poisoning, Extraction, Inference - Red and Blue Teams

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • torch2trt

    An easy to use PyTorch to TensorRT converter

  • open_model_zoo

    Pre-trained Deep Learning models and demos (high quality and extremely fast)

  • Project mention: FLaNK Stack Weekly 06 Nov 2023 | dev.to | 2023-11-06
  • AutoGPTQ

    An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.

  • Project mention: Setting up LLAMA2 70B Chat locally | /r/developersIndia | 2023-08-18
  • deepsparse

    Sparsity-aware deep learning inference runtime for CPUs

  • Project mention: Fast Llama 2 on CPUs with Sparse Fine-Tuning and DeepSparse | news.ycombinator.com | 2023-11-23

    Interesting company. Yannic Kilcher interviewed Nir Shavit last year and they went into some depth: https://www.youtube.com/watch?v=0PAiQ1jTN5k DeepSparse is on GitHub: https://github.com/neuralmagic/deepsparse

  • optimum

    🚀 Accelerate training and inference of 🤗 Transformers and 🤗 Diffusers with easy to use hardware optimization tools

  • Project mention: FastEmbed: Fast and Lightweight Embedding Generation for Text | dev.to | 2024-02-02

    Shout out to Huggingface's Optimum – which made it easier to quantize models.

  • DeepSpeed-MII

    MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.

  • transformer-deploy

    Efficient, scalable and enterprise-grade CPU/GPU inference server for 🤗 Hugging Face transformer models 🚀

  • budgetml

    Deploy a ML inference service on a budget in less than 10 lines of code.

  • BERT-NER

    Pytorch-Named-Entity-Recognition-with-BERT

  • uform

    Pocket-Sized Multimodal AI for content understanding and generation across multilingual texts, images, and 🔜 video, up to 5x faster than OpenAI CLIP and LLaVA 🖼️ & 🖋️

  • Project mention: Recapping the AI, Machine Learning and Data Science Meetup - May 30, 2024 | dev.to | 2024-06-04

    UForm: Pocket-Sized Multimodal AI for Content Understanding and Generation

  • GenossGPT

    One API for all LLMs either Private or Public (Anthropic, Llama V2, GPT 3.5/4, Vertex, GPT4ALL, HuggingFace ...) 🌈🐂 Replace OpenAI GPT with any LLMs in your app with one line.

  • Project mention: Drop-in replacement for the OpenAI API based on open source LLMs | news.ycombinator.com | 2024-01-17
  • hidet

    An open-source efficient deep learning framework/compiler, written in python.

  • Project mention: karpathy/llm.c | news.ycombinator.com | 2024-04-08

    Check out Hidet [1]. Not as well funded, but delivers Python based ML acceleration with GPU support (unlike Mojo).

    [1] https://github.com/hidet-org/hidet

  • filetype.py

    Small, dependency-free, fast Python package to infer binary file types checking the magic numbers signature

  • pinferencia

    Python + Inference - Model Deployment library in Python. Simplest model inference server ever.

  • fastT5

    ⚡ boost inference speed of T5 models by 5x & reduce the model size by 3x.

  • emlearn

    Machine Learning inference engine for Microcontrollers and Embedded devices

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python Inference discussion

Log in or Post with

Python Inference related posts

  • Cost Per 1M tokens Of Self Hosting Llama-3

    4 projects | news.ycombinator.com | 14 Jun 2024
  • Ask HN: If you are a Machine Learning engineer, what do you do at work?

    2 projects | news.ycombinator.com | 7 Jun 2024
  • Best LLM Inference Engines and Servers to Deploy LLMs in Production

    6 projects | dev.to | 5 Jun 2024
  • Rete Algorithm

    10 projects | news.ycombinator.com | 27 May 2024
  • Creando Subtítulos Automáticos para Vídeos con Python, Faster-Whisper, FFmpeg, Streamlit, Pillow

    7 projects | dev.to | 29 Apr 2024
  • CatLIP: Clip Vision Accuracy with 2.7x Faster Pre-Training on Web-Scale Data

    1 project | news.ycombinator.com | 25 Apr 2024
  • Multimodal Embeddings for JavaScript, Swift, and Python

    1 project | news.ycombinator.com | 25 Apr 2024
  • A note from our sponsor - SaaSHub
    www.saashub.com | 21 Jun 2024
    SaaSHub helps you find the best software and product alternatives Learn more →

Index

What are some of the best open-source Inference projects in Python? This list will help you:

Project Stars
1 ColossalAI 38,198
2 DeepSpeed 33,399
3 vllm 21,104
4 faster-whisper 9,798
5 text-generation-inference 8,245
6 server 7,621
7 adversarial-robustness-toolbox 4,570
8 torch2trt 4,444
9 open_model_zoo 3,994
10 AutoGPTQ 3,989
11 deepsparse 2,918
12 optimum 2,276
13 DeepSpeed-MII 1,727
14 transformer-deploy 1,633
15 budgetml 1,332
16 BERT-NER 1,182
17 uform 947
18 GenossGPT 744
19 hidet 626
20 filetype.py 620
21 pinferencia 558
22 fastT5 540
23 emlearn 453

Sponsored
Free Django app performance insights with Scout Monitoring
Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.
www.scoutapm.com