Python Inference

Open-source Python projects categorized as Inference

Top 23 Python Inference Projects

  • ColossalAI

    Making large AI models cheaper, faster and more accessible

  • Project mention: FLaNK AI-April 22, 2024 | dev.to | 2024-04-22
  • DeepSpeed

    DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

  • Project mention: Can we discuss MLOps, Deployment, Optimizations, and Speed? | /r/LocalLLaMA | 2023-12-06

    DeepSpeed can handle parallelism concerns, and even offload data/model to RAM, or even NVMe (!?) . I'm surprised I don't see this project used more.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • vllm

    A high-throughput and memory-efficient inference and serving engine for LLMs

  • Project mention: AI leaderboards are no longer useful. It's time to switch to Pareto curves | news.ycombinator.com | 2024-04-30

    I guess the root cause of my claim is that OpenAI won't tell us whether or not GPT-3.5 is an MoE model, and I assumed it wasn't. Since GPT-3.5 is clearly nondeterministic at temp=0, I believed the nondeterminism was due to FPU stuff, and this effect was amplified with GPT-4's MoE. But if GPT-3.5 is also MoE then that's just wrong.

    What makes this especially tricky is that small models are truly 100% deterministic at temp=0 because the relative likelihoods are too coarse for FPU issues to be a factor. I had thought 3.5 was big enough that some of its token probabilities were too fine-grained for the FPU. But that's probably wrong.

    On the other hand, it's not just GPT, there are currently floating-point difficulties in vllm which significantly affect the determinism of any model run on it: https://github.com/vllm-project/vllm/issues/966 Note that a suggested fix is upcasting to float32. So it's possible that GPT-3.5 is using an especially low-precision float and introducing nondeterminism by saving money on compute costs.

    Sadly I do not have the money[1] to actually run a test to falsify any of this. It seems like this would be a good little research project.

    [1] Or the time, or the motivation :) But this stuff is expensive.

  • faster-whisper

    Faster Whisper transcription with CTranslate2

  • Project mention: Creando Subtítulos Automáticos para Vídeos con Python, Faster-Whisper, FFmpeg, Streamlit, Pillow | dev.to | 2024-04-29

    Faster-whisper (https://github.com/SYSTRAN/faster-whisper)

  • text-generation-inference

    Large Language Model Text Generation Inference

  • Project mention: FLaNK AI-April 22, 2024 | dev.to | 2024-04-22
  • server

    The Triton Inference Server provides an optimized cloud and edge inferencing solution. (by triton-inference-server)

  • Project mention: FLaNK Weekly 08 Jan 2024 | dev.to | 2024-01-08
  • adversarial-robustness-toolbox

    Adversarial Robustness Toolbox (ART) - Python Library for Machine Learning Security - Evasion, Poisoning, Extraction, Inference - Red and Blue Teams

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  • torch2trt

    An easy to use PyTorch to TensorRT converter

  • open_model_zoo

    Pre-trained Deep Learning models and demos (high quality and extremely fast)

  • Project mention: FLaNK Stack Weekly 06 Nov 2023 | dev.to | 2023-11-06
  • AutoGPTQ

    An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.

  • Project mention: Setting up LLAMA2 70B Chat locally | /r/developersIndia | 2023-08-18
  • deepsparse

    Sparsity-aware deep learning inference runtime for CPUs

  • Project mention: Fast Llama 2 on CPUs with Sparse Fine-Tuning and DeepSparse | news.ycombinator.com | 2023-11-23

    Interesting company. Yannic Kilcher interviewed Nir Shavit last year and they went into some depth: https://www.youtube.com/watch?v=0PAiQ1jTN5k DeepSparse is on GitHub: https://github.com/neuralmagic/deepsparse

  • optimum

    🚀 Accelerate training and inference of 🤗 Transformers and 🤗 Diffusers with easy to use hardware optimization tools

  • Project mention: FastEmbed: Fast and Lightweight Embedding Generation for Text | dev.to | 2024-02-02

    Shout out to Huggingface's Optimum – which made it easier to quantize models.

  • DeepSpeed-MII

    MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.

  • transformer-deploy

    Efficient, scalable and enterprise-grade CPU/GPU inference server for 🤗 Hugging Face transformer models 🚀

  • budgetml

    Deploy a ML inference service on a budget in less than 10 lines of code.

  • BERT-NER

    Pytorch-Named-Entity-Recognition-with-BERT

  • uform

    Pocket-Sized Multimodal AI for content understanding and generation across multilingual texts, images, and 🔜 video, up to 5x faster than OpenAI CLIP and LLaVA 🖼️ & 🖋️

  • Project mention: CatLIP: Clip Vision Accuracy with 2.7x Faster Pre-Training on Web-Scale Data | news.ycombinator.com | 2024-04-25

    question: any good on-device size image embedding models?

    tried https://github.com/unum-cloud/uform which i do like, especially they also support languages other than English. Any recommendations on other alternatives?

  • GenossGPT

    One API for all LLMs either Private or Public (Anthropic, Llama V2, GPT 3.5/4, Vertex, GPT4ALL, HuggingFace ...) 🌈🐂 Replace OpenAI GPT with any LLMs in your app with one line.

  • Project mention: Drop-in replacement for the OpenAI API based on open source LLMs | news.ycombinator.com | 2024-01-17
  • hidet

    An open-source efficient deep learning framework/compiler, written in python.

  • Project mention: karpathy/llm.c | news.ycombinator.com | 2024-04-08

    Check out Hidet [1]. Not as well funded, but delivers Python based ML acceleration with GPU support (unlike Mojo).

    [1] https://github.com/hidet-org/hidet

  • filetype.py

    Small, dependency-free, fast Python package to infer binary file types checking the magic numbers signature

  • pinferencia

    Python + Inference - Model Deployment library in Python. Simplest model inference server ever.

  • fastT5

    ⚡ boost inference speed of T5 models by 5x & reduce the model size by 3x.

  • emlearn

    Machine Learning inference engine for Microcontrollers and Embedded devices

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python Inference related posts

  • Creando Subtítulos Automáticos para Vídeos con Python, Faster-Whisper, FFmpeg, Streamlit, Pillow

    7 projects | dev.to | 29 Apr 2024
  • CatLIP: Clip Vision Accuracy with 2.7x Faster Pre-Training on Web-Scale Data

    1 project | news.ycombinator.com | 25 Apr 2024
  • Multimodal Embeddings for JavaScript, Swift, and Python

    1 project | news.ycombinator.com | 25 Apr 2024
  • FLaNK AI-April 22, 2024

    28 projects | dev.to | 22 Apr 2024
  • Hugging Face reverts the license back to Apache 2.0

    1 project | news.ycombinator.com | 8 Apr 2024
  • Apple Explores Home Robotics as Potential 'Next Big Thing'

    3 projects | news.ycombinator.com | 4 Apr 2024
  • Using Groq to Build a Real-Time Language Translation App

    3 projects | dev.to | 5 Apr 2024
  • A note from our sponsor - SaaSHub
    www.saashub.com | 10 May 2024
    SaaSHub helps you find the best software and product alternatives Learn more →

Index

What are some of the best open-source Inference projects in Python? This list will help you:

Project Stars
1 ColossalAI 37,951
2 DeepSpeed 32,834
3 vllm 18,931
4 faster-whisper 9,014
5 text-generation-inference 7,938
6 server 7,384
7 adversarial-robustness-toolbox 4,483
8 torch2trt 4,403
9 open_model_zoo 3,957
10 AutoGPTQ 3,806
11 deepsparse 2,881
12 optimum 2,174
13 DeepSpeed-MII 1,662
14 transformer-deploy 1,622
15 budgetml 1,333
16 BERT-NER 1,182
17 uform 894
18 GenossGPT 738
19 hidet 615
20 filetype.py 610
21 pinferencia 558
22 fastT5 540
23 emlearn 424

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com