Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →
Top 23 model-serving Open-Source Projects
-
BentoML
The most flexible way to serve AI/ML models in production - Build Model Inference Service, LLM APIs, Inference Graph/Pipelines, Compound AI systems, Multi-Modal, RAG as a Service, and more!
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
lightllm
LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.
-
mlrun
MLRun is an open source MLOps platform for quickly building and managing continuous ML applications across their lifecycle. MLRun integrates into your development and CI/CD environment and automates the delivery of production data, ML pipelines, and online applications.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
functime
Time-series machine learning at scale. Built with Polars for embarrassingly parallel feature extraction and forecasts on panel data.
-
mosec
A high-performance ML model serving framework, offers dynamic batching and CPU/GPU pipelines to fully exploit your compute machine
-
pinferencia
Python + Inference - Model Deployment library in Python. Simplest model inference server ever.
-
chitra
A multi-functional library for full-stack Deep Learning. Simplifies Model Building, API development, and Model Deployment.
-
inferencedb
🚀 Stream inferences of real-time ML models in production to any data lake (Experimental)
-
sdk-javascript
The official JavaScript SDK for the Modzy Machine Learning Operations (MLOps) Platform.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Project mention: AI leaderboards are no longer useful. It's time to switch to Pareto curves | news.ycombinator.com | 2024-04-30I guess the root cause of my claim is that OpenAI won't tell us whether or not GPT-3.5 is an MoE model, and I assumed it wasn't. Since GPT-3.5 is clearly nondeterministic at temp=0, I believed the nondeterminism was due to FPU stuff, and this effect was amplified with GPT-4's MoE. But if GPT-3.5 is also MoE then that's just wrong.
What makes this especially tricky is that small models are truly 100% deterministic at temp=0 because the relative likelihoods are too coarse for FPU issues to be a factor. I had thought 3.5 was big enough that some of its token probabilities were too fine-grained for the FPU. But that's probably wrong.
On the other hand, it's not just GPT, there are currently floating-point difficulties in vllm which significantly affect the determinism of any model run on it: https://github.com/vllm-project/vllm/issues/966 Note that a suggested fix is upcasting to float32. So it's possible that GPT-3.5 is using an especially low-precision float and introducing nondeterminism by saving money on compute costs.
Sadly I do not have the money[1] to actually run a test to falsify any of this. It seems like this would be a good little research project.
[1] Or the time, or the motivation :) But this stuff is expensive.
Link to GitHub -->
To me, context caching is only a subset of what is possible with full control over the model. I consider this a more complete list: https://github.com/microsoft/aici?tab=readme-ov-file#flexibi...
Context caching only gets you “forking generation into multiple branches” (i.e. sharing work between multiple generations)
Project mention: 20x Faster as the Beginning: Introducing pgvecto.rs extension written in Rust | dev.to | 2023-08-06Mosec - A high-performance serving framework for ML models, offers dynamic batching and CPU/GPU pipelines to fully exploit your compute machine. Simple and faster alternative to NVIDIA Triton.
Step 1: Log in to your InstillAI Cloud account. If you don't have an account yet, you can create one here for free using your Email or Google or GitHub ID.
model-serving related posts
-
Experimental Mixtral MoE on vLLM!
-
Who's hiring developer advocates? (December 2023)
-
functime: NEW Data - star count:616.0
-
functime: NEW Data - star count:601.0
-
functime: NEW Data - star count:601.0
-
functime: NEW Data - star count:601.0
-
20x Faster as the Beginning: Introducing pgvecto.rs extension written in Rust
-
A note from our sponsor - InfluxDB
www.influxdata.com | 17 May 2024
Index
What are some of the best open-source model-serving projects? This list will help you:
Project | Stars | |
---|---|---|
1 | vllm | 19,344 |
2 | BentoML | 6,603 |
3 | kserve | 3,111 |
4 | lightllm | 1,856 |
5 | aici | 1,756 |
6 | mlrun | 1,316 |
7 | hopsworks | 1,086 |
8 | functime | 923 |
9 | truss | 838 |
10 | Yatai | 766 |
11 | mosec | 712 |
12 | pinferencia | 558 |
13 | OneDiffusion | 323 |
14 | chitra | 224 |
15 | serving-pytorch-models | 100 |
16 | inferencedb | 77 |
17 | vllm-rocm | 76 |
18 | Drogon-torch-serve | 26 |
19 | sdk-python | 24 |
20 | sdk-javascript | 16 |
21 | deprecated-core | 13 |
22 | TFServing-Demos | 11 |
23 | MLDrop | 3 |
Sponsored