S-LoRA Alternatives

Similar projects and alternatives to S-LoRA

text-generation-webui

876 36,552 9.9 Python S-LoRA VS text-generation-webui

A Gradio web UI for Large Language Models. Supports transformers, GPTQ, AWQ, EXL2, llama.cpp (GGUF), Llama models.
llama.cpp

775 57,463 10.0 C++ S-LoRA VS llama.cpp

LLM inference in C/C++
InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
ollama

202 64,536 9.9 Go S-LoRA VS ollama

Get up and running with Llama 3, Mistral, Gemma, and other large language models.
koboldcpp

180 3,887 10.0 C++ S-LoRA VS koboldcpp

A simple one-file way to run various GGML and GGUF models with KoboldAI's UI
private-gpt

131 51,882 9.2 Python S-LoRA VS private-gpt

Interact with your documents using the power of GPT, 100% privately, no data leaks
FastChat

83 34,277 9.6 Python S-LoRA VS FastChat

An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
SillyTavern

76 6,017 10.0 JavaScript S-LoRA VS SillyTavern

LLM Frontend for Power Users.
SaaSHub

www.saashub.com featured

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
litellm

28 8,413 10.0 Python S-LoRA VS litellm

Call all LLM APIs using the OpenAI format. Use Bedrock, Azure, OpenAI, Cohere, Anthropic, Ollama, Sagemaker, HuggingFace, Replicate (100+ LLMs)
ollama-webui

14 5,789 9.8 Svelte S-LoRA VS ollama-webui

Discontinued ChatGPT-Style WebUI for LLMs (Formerly Ollama WebUI) [Moved to: https://github.com/open-webui/open-webui]
big-AGI

8 4,329 10.0 TypeScript S-LoRA VS big-AGI

Generative AI suite powered by state-of-the-art models and providing advanced AI/AGI functions. It features AI personas, AGI functions, multi-model chats, text-to-image, voice, response streaming, code highlighting and execution, PDF import, presets for developers, much more. Deploy on-prem or in the cloud.
MindMac

5 11 0.0 S-LoRA VS MindMac

Issue Tracker for elegant client for MacOS
hoof

1 49 8.0 Rust S-LoRA VS hoof

"Just hoof it!" - A spotlight like interface to Ollama
chatbot-ollama

1 1,147 7.1 TypeScript S-LoRA VS chatbot-ollama

Chatbot Ollama is an open source chat UI for Ollama.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a better S-LoRA alternative or higher similarity.

Suggest an alternative to S-LoRA

S-LoRA reviews and mentions

Posts with mentions or reviews of S-LoRA. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-11-22.

Representation Engineering: Mistral-7B on Acid
1 project | news.ycombinator.com | 17 Feb 2024

You can also batch requests using different LoRAs. See "S-LoRA: Serving Thousands of Concurrent LoRA Adapters". https://arxiv.org/abs/2311.03285
S-LoRA: Serving Concurrent LoRA Adapters
1 project | news.ycombinator.com | 14 Dec 2023
LM Studio – Discover, download, and run local LLMs
17 projects | news.ycombinator.com | 22 Nov 2023

Depending on what you mean by "production" you'll probably want to look at "real" serving implementations like HF TGI, vLLM, lmdeploy, Triton Inference Server (tensorrt-llm), etc. There are also more bespoke implementations for things like serving large numbers of LoRA adapters[0].
These are heavily optimized for more efficient memory usage, performance, and responsiveness when serving large numbers of concurrent requests/users in addition to things like model versioning/hot load/reload/etc, Prometheus metrics, things like that.
One major difference is at this level a lot of the more aggressive memory optimization techniques and support for CPU aren't even considered. Generally speaking you get GPTQ and possibly AWQ quantization + their optimizations + CUDA only. Their target users and their use cases are often using A100/H100 and just trying to need fewer of them. Support for lower VRAM cards, older CUDA compute architectures, etc come secondary to that (for the most part).
[0] - https://github.com/S-LoRA/S-LoRA
GitHub - S-LoRA/S-LoRA: S-LoRA: Serving Thousands of Concurrent LoRA Adapters
1 project | /r/LocalLLaMA | 14 Nov 2023
A note from our sponsor - InfluxDB
www.influxdata.com | 8 May 2024

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →