LM Studio – Discover, download, and run local LLMs

Our great sponsors

InfluxDB - Power Real-Time Data Analytics at Scale

WorkOS - The modern identity platform for B2B SaaS

SaaSHub - Software Alternatives and Reviews

Our great sponsors

ollama

192 58,943 9.9 Go

Get up and running with Llama 3, Mistral, Gemma, and other large language models.

I looked at Ollama before, but couldn't quite figure something out from the docs [1]
It looks like a lot of the tooling is heavily engineered for a set of modern popular LLM-esque models. And looks like llama.cpp also supports LoRA models, so I'd assume there is a way to engineer a pipeline from LoRA to llama.cpp deployments, which probably covers quite a broad set of possibilities.
Beyond llama.cpp, can someone point me to what the broader community uses for general PyTorch model deployments?
I haven't quite ever self-hosted models, and am really keen to do one. Ideally, I am looking for something that stays close to the PyTorch core, and therefore allows me the flexibility to take any nn.Module to production.
[1]: https://github.com/jmorganca/ollama/blob/main/docs/import.md.

llama.cpp

769 55,846 10.0 C++

LLM inference in C/C++

Actually, consider that the commenter may have helped un-obfuscate this world a little bit by saying that it is in fact easy. To be honest the hardest part about the local LLM scene is the absurd amount of jargon introduced - everything looks a bit more complex than it is. It’s really is easy with llama.cpp, someone even wrote a tutorial here: https://github.com/ggerganov/llama.cpp/discussions/2948 .
But yes, TheBloke tends to have conversions up very quickly as well and has made a name for himself for doing this (+more)

InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
text-generation-webui

876 36,293 9.9 Python

A Gradio web UI for Large Language Models. Supports transformers, GPTQ, AWQ, EXL2, llama.cpp (GGUF), Llama models.
chatbot-ollama

1 1,090 7.1 TypeScript

Chatbot Ollama is an open source chat UI for Ollama.

I'm currently doing this in an M2 Max with ollama and a nextjs UI [0] running on local docker. Any devices on the network can use the UI... and I guess if you want a LAN API you just need to run another container with with OAI compatible that can query ollama.. eg [1]
[0]https://github.com/ivanfioravanti/chatbot-ollama

litellm

28 8,225 10.0 Python

Call all LLM APIs using the OpenAI format. Use Bedrock, Azure, OpenAI, Cohere, Anthropic, Ollama, Sagemaker, HuggingFace, Replicate (100+ LLMs)
MindMac

5 10 0.6

Issue Tracker for elegant client for MacOS

LMStudio is great to run local LLMs, also support OpenAI-compatible API. In the case you need more advance UI/UX, you can use LMStudio with MindMac(https://mindmac.app), just check this video for details https://www.youtube.com/watch?v=3KcVp5QQ1Ak.

ollama-webui

14 5,789 9.8 Svelte

Discontinued ChatGPT-Style WebUI for LLMs (Formerly Ollama WebUI) [Moved to: https://github.com/open-webui/open-webui]
WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
S-LoRA

4 1,456 7.1 Python

S-LoRA: Serving Thousands of Concurrent LoRA Adapters

Depending on what you mean by "production" you'll probably want to look at "real" serving implementations like HF TGI, vLLM, lmdeploy, Triton Inference Server (tensorrt-llm), etc. There are also more bespoke implementations for things like serving large numbers of LoRA adapters[0].
These are heavily optimized for more efficient memory usage, performance, and responsiveness when serving large numbers of concurrent requests/users in addition to things like model versioning/hot load/reload/etc, Prometheus metrics, things like that.
One major difference is at this level a lot of the more aggressive memory optimization techniques and support for CPU aren't even considered. Generally speaking you get GPTQ and possibly AWQ quantization + their optimizations + CUDA only. Their target users and their use cases are often using A100/H100 and just trying to need fewer of them. Support for lower VRAM cards, older CUDA compute architectures, etc come secondary to that (for the most part).
[0] - https://github.com/S-LoRA/S-LoRA

big-AGI

8 4,152 10.0 TypeScript

Generative AI suite powered by state-of-the-art models and providing advanced AI/AGI functions. It features AI personas, AGI functions, multi-model chats, text-to-image, voice, response streaming, code highlighting and execution, PDF import, presets for developers, much more. Deploy on-prem or in the cloud.
FastChat

82 33,877 9.6 Python

An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.

How does it compare with something like FastChat? https://github.com/lm-sys/FastChat
Feature set seems like a decent amount of overlap. One limitation of FastChat, as far as I can tell, is that one is limited to the models that FastChat supports (though I think it would be minor to modify it to support arbitrary models?)

koboldcpp

180 3,749 10.0 C++

A simple one-file way to run various GGML and GGUF models with KoboldAI's UI
SillyTavern

76 5,785 10.0 JavaScript

LLM Frontend for Power Users.
hoof

1 48 8.0 Rust

"Just hoof it!" - A spotlight like interface to Ollama

With a couple of other folks I'm currently working on an Ollama GUI: https://github.com/ai-qol-things/rusty-ollama

private-gpt

131 51,732 9.2 Python

Interact with your documents using the power of GPT, 100% privately, no data leaks
SaaSHub

www.saashub.com sponsored

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

chameleon-llm: Codes for "Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models".
1 project | /r/aipromptprogramming | 25 Apr 2023
A suite of tools designed to streamline the development cycle of LLM-based apps
1 project | news.ycombinator.com | 12 Apr 2024
Agent Cloud VS OpenAI
1 project | dev.to | 11 Apr 2024
Agent Cloud vs CrewAI
1 project | dev.to | 5 Apr 2024
Anthropic launches Tool Use (function calling)
3 projects | news.ycombinator.com | 5 Apr 2024

LM Studio – Discover, download, and run local LLMs

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
AI chatgpt gpt-4 Webui llm
Post date: 22 Nov 2023

ollama

llama.cpp

InfluxDB

text-generation-webui

chatbot-ollama

litellm

MindMac

ollama-webui

WorkOS

S-LoRA

big-AGI

FastChat

koboldcpp

SillyTavern

hoof

private-gpt

SaaSHub

Related posts

LM Studio – Discover, download, and run local LLMs

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com AI chatgpt gpt-4 Webui llm Post date: 22 Nov 2023

Related posts

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
AI chatgpt gpt-4 Webui llm
Post date: 22 Nov 2023