S-LoRA

S-LoRA: Serving Thousands of Concurrent LoRA Adapters (by S-LoRA)

S-LoRA Alternatives

Similar projects and alternatives to S-LoRA

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a better S-LoRA alternative or higher similarity.

S-LoRA reviews and mentions

Posts with mentions or reviews of S-LoRA. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-11-22.
  • Representation Engineering: Mistral-7B on Acid
    1 project | news.ycombinator.com | 17 Feb 2024
    You can also batch requests using different LoRAs. See "S-LoRA: Serving Thousands of Concurrent LoRA Adapters". https://arxiv.org/abs/2311.03285
  • S-LoRA: Serving Concurrent LoRA Adapters
    1 project | news.ycombinator.com | 14 Dec 2023
  • LM Studio – Discover, download, and run local LLMs
    17 projects | news.ycombinator.com | 22 Nov 2023
    Depending on what you mean by "production" you'll probably want to look at "real" serving implementations like HF TGI, vLLM, lmdeploy, Triton Inference Server (tensorrt-llm), etc. There are also more bespoke implementations for things like serving large numbers of LoRA adapters[0].

    These are heavily optimized for more efficient memory usage, performance, and responsiveness when serving large numbers of concurrent requests/users in addition to things like model versioning/hot load/reload/etc, Prometheus metrics, things like that.

    One major difference is at this level a lot of the more aggressive memory optimization techniques and support for CPU aren't even considered. Generally speaking you get GPTQ and possibly AWQ quantization + their optimizations + CUDA only. Their target users and their use cases are often using A100/H100 and just trying to need fewer of them. Support for lower VRAM cards, older CUDA compute architectures, etc come secondary to that (for the most part).

    [0] - https://github.com/S-LoRA/S-LoRA

  • GitHub - S-LoRA/S-LoRA: S-LoRA: Serving Thousands of Concurrent LoRA Adapters
    1 project | /r/LocalLLaMA | 14 Nov 2023
  • A note from our sponsor - InfluxDB
    www.influxdata.com | 8 May 2024
    Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →

Stats

Basic S-LoRA repo stats
4
1,487
7.1
4 months ago

S-LoRA/S-LoRA is an open source project licensed under Apache License 2.0 which is an OSI approved license.

The primary programming language of S-LoRA is Python.


Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com