SaaSHub helps you find the best software and product alternatives Learn more →
Top 15 Python model-serving Projects
-
BentoML
The most flexible way to serve AI/ML models in production - Build Model Inference Service, LLM APIs, Inference Graph/Pipelines, Compound AI systems, Multi-Modal, RAG as a Service, and more!
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
lightllm
LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.
-
mlrun
MLRun is an open source MLOps platform for quickly building and managing continuous ML applications across their lifecycle. MLRun integrates into your development and CI/CD environment and automates the delivery of production data, ML pipelines, and online applications.
-
functime
Time-series machine learning at scale. Built with Polars for embarrassingly parallel feature extraction and forecasts on panel data.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
mosec
A high-performance ML model serving framework, offers dynamic batching and CPU/GPU pipelines to fully exploit your compute machine
-
pinferencia
Python + Inference - Model Deployment library in Python. Simplest model inference server ever.
-
chitra
A multi-functional library for full-stack Deep Learning. Simplifies Model Building, API development, and Model Deployment.
-
inferencedb
🚀 Stream inferences of real-time ML models in production to any data lake (Experimental)
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Project mention: AI leaderboards are no longer useful. It's time to switch to Pareto curves | news.ycombinator.com | 2024-04-30I guess the root cause of my claim is that OpenAI won't tell us whether or not GPT-3.5 is an MoE model, and I assumed it wasn't. Since GPT-3.5 is clearly nondeterministic at temp=0, I believed the nondeterminism was due to FPU stuff, and this effect was amplified with GPT-4's MoE. But if GPT-3.5 is also MoE then that's just wrong.
What makes this especially tricky is that small models are truly 100% deterministic at temp=0 because the relative likelihoods are too coarse for FPU issues to be a factor. I had thought 3.5 was big enough that some of its token probabilities were too fine-grained for the FPU. But that's probably wrong.
On the other hand, it's not just GPT, there are currently floating-point difficulties in vllm which significantly affect the determinism of any model run on it: https://github.com/vllm-project/vllm/issues/966 Note that a suggested fix is upcasting to float32. So it's possible that GPT-3.5 is using an especially low-precision float and introducing nondeterminism by saving money on compute costs.
Sadly I do not have the money[1] to actually run a test to falsify any of this. It seems like this would be a good little research project.
[1] Or the time, or the motivation :) But this stuff is expensive.
Link to GitHub -->
Project mention: 20x Faster as the Beginning: Introducing pgvecto.rs extension written in Rust | dev.to | 2023-08-06Mosec - A high-performance serving framework for ML models, offers dynamic batching and CPU/GPU pipelines to fully exploit your compute machine. Simple and faster alternative to NVIDIA Triton.
Python model-serving related posts
-
Experimental Mixtral MoE on vLLM!
-
functime: NEW Data - star count:616.0
-
functime: NEW Data - star count:601.0
-
functime: NEW Data - star count:601.0
-
functime: NEW Data - star count:601.0
-
20x Faster as the Beginning: Introducing pgvecto.rs extension written in Rust
-
[D] Handling Concurrent Request for ML Model API
-
A note from our sponsor - SaaSHub
www.saashub.com | 10 May 2024
Index
What are some of the best open-source model-serving projects in Python? This list will help you:
Project | Stars | |
---|---|---|
1 | vllm | 18,931 |
2 | BentoML | 6,586 |
3 | kserve | 3,083 |
4 | lightllm | 1,835 |
5 | mlrun | 1,308 |
6 | functime | 914 |
7 | truss | 837 |
8 | mosec | 709 |
9 | pinferencia | 558 |
10 | OneDiffusion | 323 |
11 | chitra | 224 |
12 | inferencedb | 77 |
13 | vllm-rocm | 76 |
14 | sdk-python | 24 |
15 | AquilaHub | 2 |
Sponsored