What do you use to run your models?

Our great sponsors

InfluxDB - Power Real-Time Data Analytics at Scale

WorkOS - The modern identity platform for B2B SaaS

SaaSHub - Software Alternatives and Reviews

Our great sponsors

ollama

192 58,943 9.9 Go

Get up and running with Llama 3, Mistral, Gemma, and other large language models.

https://ollama.ai. It's a menu bar Mac app to run the server and cli that lets you pull & run a variety of popular models from its library. No need to compile anything or install a bunch of dependencies. Support of Apple Silicon GPUs is enabled by default. I'd be surprised if anything else will get you up and running quickly as quickly.

koboldcpp

180 3,749 10.0 C++

A simple one-file way to run various GGML and GGUF models with KoboldAI's UI

I like koboldcpp for the simplicity, but currently prefer the speed of exllamav2 (e. g. Goliath 120B at over 10 tokens per second), included with oobabooga's text-generation-webui which I can remote-control easily from my browser.

InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
exllamav2

17 2,880 9.8 Python

A fast inference library for running LLMs locally on modern consumer-class GPUs

Sorry, I'm somewhat familiar with this term (I've seen it as a model loader in Oobabooga), but still not following the correlation here. Are you saying I should instead be using this project in lieu of llama.cpp? Or are you saying that there is, perhaps, an exllamav2 "extension" or similar within llama.cpp that I can use?

llama-api

1 101 8.7 Python

An OpenAI-like LLaMA inference API

https://github.com/c0sogi/llama-api , right? This offers better performance on GPU-optimized models, right?

ghostpad

2 32 9.6 Python

A free AI text generation interface based on KoboldAI

fresh from the oven someone just posted this https://github.com/ghostpad/ghostpad seems like great (from https://www.reddit.com/r/LocalLLaMA/comments/18crcms/ghostpad_now_supports_llamacpp/?sort=new)!

refact

33 1,412 9.8 JavaScript

WebUI for Fine-Tuning and Self-hosting of Open-Source Large Language Models for Coding

On vscode i sometimes use continue.dev and refact.ai just for fun and they are great!

ReAIterator

1 2 7.2 Python

Reiterate text file through AI

Mainly desire to control the exact prompt, so instead of UI silently cutting it, I can comment out blocks of text from being fed to the model and rewrite them to shorter blocks(UIs don't support commenting out blocks). On long stories it's quite frustrating to have only rough idea what model sees. Especially on UIs with world info where it can inject itself at will. So my tool panics if sees too many tokens and calls vim over and over(Hence the name, reaiterator until number of tokens gets reduced to desired number. Also vim is better editor than browser. Especially with undotree. I also didn't like that ooba doesn't have several generations at the same time while kobold has, but they run in parallel similar to several batches of the same prompt: it causes OoM. Not sure if this behavior still persists in kobold.cpp.

WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
LocalAI

82 19,593 9.9 C++

:robot: The free, Open Source OpenAI alternative. Self-hosted, community-driven and local-first. Drop-in replacement for OpenAI running on consumer-grade hardware. No GPU required. Runs gguf, transformers, diffusers and many more models architectures. It allows to generate Text, Audio, Video, Images. Also with voice cloning capabilities.

If you're running this as a server, I would recommend LocalAI https://github.com/mudler/LocalAI

text-generation-webui

876 36,293 9.9 Python

A Gradio web UI for Large Language Models. Supports transformers, GPTQ, AWQ, EXL2, llama.cpp (GGUF), Llama models.

I like koboldcpp for the simplicity, but currently prefer the speed of exllamav2 (e. g. Goliath 120B at over 10 tokens per second), included with oobabooga's text-generation-webui which I can remote-control easily from my browser.

SillyTavern

76 5,785 10.0 JavaScript

LLM Frontend for Power Users.

Finally, no matter what backend I use, I need it to be compatible with my power-user frontend, SillyTavern. That way I always use the same UI, with the characters I created and extensions I want, e. g. web search, XTTS text-to-speech and Whisper speech recognition for real-time voice chat - and all of that local!

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project