SaaSHub helps you find the best software and product alternatives Learn more →
Top 9 llm-serving Open-Source Projects
-
Ray
Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
OpenLLM
Run any open-source LLMs, such as Llama 2, Mistral, as OpenAI compatible API endpoint in the cloud.
-
skypilot
SkyPilot: Run LLMs, AI, and Batch jobs on any cloud. Get maximum savings, highest GPU availability, and managed execution—all with a simple interface.
-
mosec
A high-performance ML model serving framework, offers dynamic batching and CPU/GPU pipelines to fully exploit your compute machine
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
Awesome-LLM-Productization
Awesome-LLM-Productization: a curated list of tools/tricks/news/regulations about AI and Large Language Model (LLM) productization
Project mention: Ray: Unified framework for scaling AI and Python applications | news.ycombinator.com | 2024-05-03
Project mention: AI leaderboards are no longer useful. It's time to switch to Pareto curves | news.ycombinator.com | 2024-04-30I guess the root cause of my claim is that OpenAI won't tell us whether or not GPT-3.5 is an MoE model, and I assumed it wasn't. Since GPT-3.5 is clearly nondeterministic at temp=0, I believed the nondeterminism was due to FPU stuff, and this effect was amplified with GPT-4's MoE. But if GPT-3.5 is also MoE then that's just wrong.
What makes this especially tricky is that small models are truly 100% deterministic at temp=0 because the relative likelihoods are too coarse for FPU issues to be a factor. I had thought 3.5 was big enough that some of its token probabilities were too fine-grained for the FPU. But that's probably wrong.
On the other hand, it's not just GPT, there are currently floating-point difficulties in vllm which significantly affect the determinism of any model run on it: https://github.com/vllm-project/vllm/issues/966 Note that a suggested fix is upcasting to float32. So it's possible that GPT-3.5 is using an especially low-precision float and introducing nondeterminism by saving money on compute costs.
Sadly I do not have the money[1] to actually run a test to falsify any of this. It seems like this would be a good little research project.
[1] Or the time, or the motivation :) But this stuff is expensive.
13. OpenLLM by BentoML | Github | tutorial
Project mention: Alternative clouds are booming as companies seek cheaper access to GPUs | news.ycombinator.com | 2024-05-06Skypilot is worth a mention here:
https://github.com/skypilot-org/skypilot
Open source CLI to deploy multiple gpu vm’s on all major cloud providers, with an option to use spot pricing with 1 cheap vm used as a controller to always make sure you have the most inexpensive deployment available with failover and load balancing.
It’s like beating the cloud providers at their own game I wouldn’t be surprised if they banned it.
To me, context caching is only a subset of what is possible with full control over the model. I consider this a more complete list: https://github.com/microsoft/aici?tab=readme-ov-file#flexibi...
Context caching only gets you “forking generation into multiple branches” (i.e. sharing work between multiple generations)
Project mention: 20x Faster as the Beginning: Introducing pgvecto.rs extension written in Rust | dev.to | 2023-08-06Mosec - A high-performance serving framework for ML models, offers dynamic batching and CPU/GPU pipelines to fully exploit your compute machine. Simple and faster alternative to NVIDIA Triton.
Project mention: Deploy and Fine-tune large language models on k8s - Trying this out this weekend | /r/AIOps | 2023-09-21
Project mention: Show HN: Made Open Source NPM like package ecosytem for reusable Prompts | news.ycombinator.com | 2023-12-15
There are a bunch of open source projects or commercial projects productising LLMs, but there are still challenges in, e.g., latency, cost reduction, fine-tuning, data preparation, monitoring to name a few.
This repo monitors projects or packages that can help you speed up the adoption, with boilerplate, E2E backend and real-world use cases as its goal.
Please feel free to open issues and more contents will be coming soon.
https://github.com/oscinis-com/Awesome-LLM-Productization/
llm-serving related posts
-
The Lost Arts of CLJS Frontend
-
OpenLLM: An open platform for operating large language models (LLMs) in production.
-
OpenLLM: An open platform for operating large language models (LLMs) in production.
-
OpenLLM: An open platform for operating large language models (LLMs) in production.
-
GPT Weekly - 26the June Edition - 🎙️ Meta's Voicebox is Paused, 🖼️SDXL 0.9, 📜AI Compliance & EU Act and more
-
OpenLLM: OSS to easily serve Open Source LLMs
-
OpenLLM: OSS to easily serve Open Source LLMs
-
A note from our sponsor - SaaSHub
www.saashub.com | 24 May 2024
Index
What are some of the best open-source llm-serving projects? This list will help you:
Project | Stars | |
---|---|---|
1 | Ray | 31,493 |
2 | vllm | 19,672 |
3 | OpenLLM | 8,963 |
4 | skypilot | 5,762 |
5 | aici | 1,771 |
6 | mosec | 712 |
7 | runbooks | 158 |
8 | sugarcane-ai | 46 |
9 | Awesome-LLM-Productization | 20 |
Sponsored