What do you use to run your models?

This page summarizes the projects mentioned and recommended in the original post on /r/LocalLLaMA

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
  • ollama

    Get up and running with Llama 3, Mistral, Gemma, and other large language models.

  • https://ollama.ai. It's a menu bar Mac app to run the server and cli that lets you pull & run a variety of popular models from its library. No need to compile anything or install a bunch of dependencies. Support of Apple Silicon GPUs is enabled by default. I'd be surprised if anything else will get you up and running quickly as quickly.

  • koboldcpp

    A simple one-file way to run various GGML and GGUF models with KoboldAI's UI

  • I like koboldcpp for the simplicity, but currently prefer the speed of exllamav2 (e. g. Goliath 120B at over 10 tokens per second), included with oobabooga's text-generation-webui which I can remote-control easily from my browser.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • exllamav2

    A fast inference library for running LLMs locally on modern consumer-class GPUs

  • Sorry, I'm somewhat familiar with this term (I've seen it as a model loader in Oobabooga), but still not following the correlation here. Are you saying I should instead be using this project in lieu of llama.cpp? Or are you saying that there is, perhaps, an exllamav2 "extension" or similar within llama.cpp that I can use?

  • llama-api

    An OpenAI-like LLaMA inference API

  • https://github.com/c0sogi/llama-api , right? This offers better performance on GPU-optimized models, right?

  • ghostpad

    A free AI text generation interface based on KoboldAI

  • fresh from the oven someone just posted this https://github.com/ghostpad/ghostpad seems like great (from https://www.reddit.com/r/LocalLLaMA/comments/18crcms/ghostpad_now_supports_llamacpp/?sort=new)!

  • refact

    WebUI for Fine-Tuning and Self-hosting of Open-Source Large Language Models for Coding

  • On vscode i sometimes use continue.dev and refact.ai just for fun and they are great!

  • ReAIterator

    Reiterate text file through AI

  • Mainly desire to control the exact prompt, so instead of UI silently cutting it, I can comment out blocks of text from being fed to the model and rewrite them to shorter blocks(UIs don't support commenting out blocks). On long stories it's quite frustrating to have only rough idea what model sees. Especially on UIs with world info where it can inject itself at will. So my tool panics if sees too many tokens and calls vim over and over(Hence the name, reaiterator until number of tokens gets reduced to desired number. Also vim is better editor than browser. Especially with undotree. I also didn't like that ooba doesn't have several generations at the same time while kobold has, but they run in parallel similar to several batches of the same prompt: it causes OoM. Not sure if this behavior still persists in kobold.cpp.

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • LocalAI

    :robot: The free, Open Source OpenAI alternative. Self-hosted, community-driven and local-first. Drop-in replacement for OpenAI running on consumer-grade hardware. No GPU required. Runs gguf, transformers, diffusers and many more models architectures. It allows to generate Text, Audio, Video, Images. Also with voice cloning capabilities.

  • If you're running this as a server, I would recommend LocalAI https://github.com/mudler/LocalAI

  • text-generation-webui

    A Gradio web UI for Large Language Models. Supports transformers, GPTQ, AWQ, EXL2, llama.cpp (GGUF), Llama models.

  • I like koboldcpp for the simplicity, but currently prefer the speed of exllamav2 (e. g. Goliath 120B at over 10 tokens per second), included with oobabooga's text-generation-webui which I can remote-control easily from my browser.

  • SillyTavern

    LLM Frontend for Power Users.

  • Finally, no matter what backend I use, I need it to be compatible with my power-user frontend, SillyTavern. That way I always use the same UI, with the characters I created and extensions I want, e. g. web search, XTTS text-to-speech and Whisper speech recognition for real-time voice chat - and all of that local!

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts