Serving Alternatives

Similar projects and alternatives to serving

llama.cpp

769 55,846 10.0 C++ serving VS llama.cpp

LLM inference in C/C++
julia

350 44,510 10.0 Julia serving VS julia

The Julia Programming Language
WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
tensorflow

221 182,456 10.0 C++ serving VS tensorflow

An Open Source Machine Learning Framework for Everyone
whisper.cpp

187 30,942 9.8 C serving VS whisper.cpp

Port of OpenAI's Whisper model in C/C++
mlc-llm

89 16,774 9.9 Python serving VS mlc-llm

Enable everyone to develop, optimize and deploy AI models natively on everyone's devices.
exllama

64 2,582 9.0 Python serving VS exllama

A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.
maturin

37 3,232 9.4 Rust serving VS maturin

Build and publish crates with pyo3, cffi and uniffi bindings as well as rust binaries as python packages
InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
server

24 7,314 9.5 Python serving VS server

The Triton Inference Server provides an optimized cloud and edge inferencing solution. (by triton-inference-server)
lit-llama

23 5,789 8.4 Python serving VS lit-llama

Implementation of the LLaMA language model based on nanoGPT. Supports flash attention, Int8 and GPTQ 4bit quantization, LoRA and LLaMA-Adapter fine-tuning, pre-training. Apache 2.0-licensed.
darknet

22 25,264 0.0 C serving VS darknet

Convolutional Neural Networks
pinferencia

21 556 0.0 Python serving VS pinferencia

Python + Inference - Model Deployment library in Python. Simplest model inference server ever.
serve

11 3,949 9.6 Java serving VS serve

Serve, optimize and scale PyTorch models in production (by pytorch)
glow

6 3,145 8.1 C++ serving VS glow

Compiler for Neural Network hardware accelerators (by pytorch)
MNN

3 8,293 8.1 C++ serving VS MNN

MNN is a blazing fast, lightweight deep learning framework, battle-tested by business-critical use cases in Alibaba
XLA.jl

1 224 10.0 Julia serving VS XLA.jl

Discontinued Julia on TPUs
flashlight

16 5,145 7.7 C++ serving VS flashlight

A C++ standalone library for machine learning (by flashlight)
runtime

2 746 9.6 C++ serving VS runtime

A performant and modular runtime for TensorFlow (by tensorflow)
oneflow

32 5,721 8.4 C++ serving VS oneflow

OneFlow is a deep learning framework designed to be user-friendly, scalable and efficient.
flake

5 593 4.4 Nix serving VS flake

A Nix flake for many AI projects
llama_cpp.rb

2 129 9.6 C++ serving VS llama_cpp.rb

llama_cpp provides Ruby bindings for llama.cpp
SaaSHub

www.saashub.com sponsored

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a better serving alternative or higher similarity.

Suggest an alternative to serving

serving reviews and mentions

Posts with mentions or reviews of serving. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-06-12.

Llama.cpp: Full CUDA GPU Acceleration
14 projects | news.ycombinator.com | 12 Jun 2023

Yet another TEDIOUS BATTLE: Python vs. C++/C stack.
This project gained popularity due to the HIGH DEMAND for running large models with 1B+ parameters, like `llama`. Python dominates the interface and training ecosystem, but prior to llama.cpp, non-ML professionals showed little interest in a fast C++ interface library. While existing solutions like tensorflow-serving [1] in C++ were sufficiently fast with GPU support, llama.cpp took the initiative to optimize for CPU and trim unnecessary code, essentially code-golfing and sacrificing some algorithm correctness for improved performance, which isn't favored by "ML research".
NOTE: In my opinion, a true pioneer was DarkNet, which implemented the YOLO model series and significantly outperformed others [2]. Same trick basically like llama.cpp
[1] https://github.com/tensorflow/serving
[D] How do OpenAI and other companies manage to have real-time inference on model with billions of parameters over an API?
1 project | /r/learnmachinelearning | 21 Mar 2023

I mean, probably - it's written in C++ https://github.com/tensorflow/serving
Should I wait for the M2 Macbook Pro?
1 project | /r/macbookpro | 10 Oct 2022

We’re looking into that solution at the moment, the issue I’m referring to is related to this https://github.com/tensorflow/serving/issues/1948 we’ll know if the plug-in approach works for our uses soon but haven’t started looking into implementing it yet
TF Serving has been unavailable for 9 days so far due to outdated GPG key
1 project | /r/MachineLearning | 28 Jul 2022
TF Serving has been unavailable for 8 days
1 project | news.ycombinator.com | 27 Jul 2022
Would you use maturin for ML model serving?
2 projects | /r/rust | 8 Jul 2022

Which ML framework do you use? Tensorflow has https://github.com/tensorflow/serving. You could also use the Rust bindings to load a saved model and expose it using one of the Rust HTTP servers. It doesn't matter whether you trained your model in Python as long as you export its saved model.
Is LaMDA Sentient? – An Interview [pdf]
1 project | news.ycombinator.com | 13 Jun 2022

Most likely it's a model server running something like https://github.com/tensorflow/serving and if there isn't a lot of load, the resource could kill some of its tasks. I wouldn't imagine it's sitting around pondering deep thoughts.
Ask HN: How to deploy a TensorFlow model for access through an HTTP endpoint?
1 project | news.ycombinator.com | 25 May 2022

https://github.com/tensorflow/serving
https://thenewstack.io/tutorial-deploying-tensorflow-models-...
Popular Machine Learning Deployment Tools
4 projects | dev.to | 16 Apr 2022

GitHub
If data science uses a lot of computational power, then why is python the most used programming language?
6 projects | /r/learnmachinelearning | 13 Apr 2022

You serve models via https://www.tensorflow.org/tfx/guide/serving which is written entirely in C++ (https://github.com/tensorflow/serving/tree/master/tensorflow_serving/model_servers), no Python on the serving path or in the shipped product.
A note from our sponsor - WorkOS
workos.com | 25 Apr 2024

The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning. Learn more →

Stats

Basic serving repo stats

Mentions

Stars

6,071

Activity

9.8

Last Commit

about 17 hours ago

tensorflow/serving is an open source project licensed under Apache License 2.0 which is an OSI approved license.

The primary programming language of serving is C++.

Popular Comparisons