LM Studio – Discover, download, and run local LLMs

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
  • ollama

    Get up and running with Llama 3, Mistral, Gemma, and other large language models.

  • I looked at Ollama before, but couldn't quite figure something out from the docs [1]

    It looks like a lot of the tooling is heavily engineered for a set of modern popular LLM-esque models. And looks like llama.cpp also supports LoRA models, so I'd assume there is a way to engineer a pipeline from LoRA to llama.cpp deployments, which probably covers quite a broad set of possibilities.

    Beyond llama.cpp, can someone point me to what the broader community uses for general PyTorch model deployments?

    I haven't quite ever self-hosted models, and am really keen to do one. Ideally, I am looking for something that stays close to the PyTorch core, and therefore allows me the flexibility to take any nn.Module to production.

    [1]: https://github.com/jmorganca/ollama/blob/main/docs/import.md.

  • llama.cpp

    LLM inference in C/C++

  • Actually, consider that the commenter may have helped un-obfuscate this world a little bit by saying that it is in fact easy. To be honest the hardest part about the local LLM scene is the absurd amount of jargon introduced - everything looks a bit more complex than it is. It’s really is easy with llama.cpp, someone even wrote a tutorial here: https://github.com/ggerganov/llama.cpp/discussions/2948 .

    But yes, TheBloke tends to have conversions up very quickly as well and has made a name for himself for doing this (+more)

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • text-generation-webui

    A Gradio web UI for Large Language Models. Supports transformers, GPTQ, AWQ, EXL2, llama.cpp (GGUF), Llama models.

  • chatbot-ollama

    Chatbot Ollama is an open source chat UI for Ollama.

  • I'm currently doing this in an M2 Max with ollama and a nextjs UI [0] running on local docker. Any devices on the network can use the UI... and I guess if you want a LAN API you just need to run another container with with OAI compatible that can query ollama.. eg [1]

    [0]https://github.com/ivanfioravanti/chatbot-ollama

  • litellm

    Call all LLM APIs using the OpenAI format. Use Bedrock, Azure, OpenAI, Cohere, Anthropic, Ollama, Sagemaker, HuggingFace, Replicate (100+ LLMs)

  • MindMac

    Issue Tracker for elegant client for MacOS

  • LMStudio is great to run local LLMs, also support OpenAI-compatible API. In the case you need more advance UI/UX, you can use LMStudio with MindMac(https://mindmac.app), just check this video for details https://www.youtube.com/watch?v=3KcVp5QQ1Ak.

  • ollama-webui

    Discontinued ChatGPT-Style WebUI for LLMs (Formerly Ollama WebUI) [Moved to: https://github.com/open-webui/open-webui]

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • S-LoRA

    S-LoRA: Serving Thousands of Concurrent LoRA Adapters

  • Depending on what you mean by "production" you'll probably want to look at "real" serving implementations like HF TGI, vLLM, lmdeploy, Triton Inference Server (tensorrt-llm), etc. There are also more bespoke implementations for things like serving large numbers of LoRA adapters[0].

    These are heavily optimized for more efficient memory usage, performance, and responsiveness when serving large numbers of concurrent requests/users in addition to things like model versioning/hot load/reload/etc, Prometheus metrics, things like that.

    One major difference is at this level a lot of the more aggressive memory optimization techniques and support for CPU aren't even considered. Generally speaking you get GPTQ and possibly AWQ quantization + their optimizations + CUDA only. Their target users and their use cases are often using A100/H100 and just trying to need fewer of them. Support for lower VRAM cards, older CUDA compute architectures, etc come secondary to that (for the most part).

    [0] - https://github.com/S-LoRA/S-LoRA

  • big-AGI

    Generative AI suite powered by state-of-the-art models and providing advanced AI/AGI functions. It features AI personas, AGI functions, multi-model chats, text-to-image, voice, response streaming, code highlighting and execution, PDF import, presets for developers, much more. Deploy on-prem or in the cloud.

  • FastChat

    An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.

  • How does it compare with something like FastChat? https://github.com/lm-sys/FastChat

    Feature set seems like a decent amount of overlap. One limitation of FastChat, as far as I can tell, is that one is limited to the models that FastChat supports (though I think it would be minor to modify it to support arbitrary models?)

  • koboldcpp

    A simple one-file way to run various GGML and GGUF models with KoboldAI's UI

  • SillyTavern

    LLM Frontend for Power Users.

  • hoof

    "Just hoof it!" - A spotlight like interface to Ollama

  • With a couple of other folks I'm currently working on an Ollama GUI: https://github.com/ai-qol-things/rusty-ollama

  • private-gpt

    Interact with your documents using the power of GPT, 100% privately, no data leaks

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts