KoboldCpp

Our great sponsors

WorkOS - The modern identity platform for B2B SaaS

InfluxDB - Power Real-Time Data Analytics at Scale

SaaSHub - Software Alternatives and Reviews

Our great sponsors

koboldcpp

180 3,749 10.0 C++

A simple one-file way to run various GGML and GGUF models with KoboldAI's UI

It's possible you have a very old CPU. Can you try the noavx2 build? https://github.com/LostRuins/koboldcpp/releases/download/v1.1/koboldcpp_noavx2.exe

TavernAI

17 76 10.0 JavaScript

Discontinued TavernAI for nerds [Moved to: https://github.com/Cohee1207/SillyTavern] (by SillyLossy)

Have you tried to talk to both at the same time? With TavernAI group chats are actually possible. The current version isn't compatible with koboldcpp, but the dev version has a fix, and I'm just getting started playing around with it.

WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
alpaca.cpp

94 9,878 9.4 C

Discontinued Locally run an Instruction-Tuned Chat-Style LLM

All versions of ggml ALPACA models (legacy format from alpaca.cpp, and also all the newer ggml alpacas on huggingface)

pygmalion.cpp

6 57 10.0 C

Discontinued C/C++ implementation of PygmalionAI/pygmalion-6b

GPT-J/JT models (legacy f16 formats here as well as 4 bit quantized ones like this and pygmalion see pyg.cpp)

gpt4all

139 64,046 9.8 C++

gpt4all: run open-source LLMs anywhere

And GPT4ALL without conversion required

llama.cpp

769 56,891 10.0 C++

LLM inference in C/C++

Hey, that's a very cool project (again!). Having only 8 GB VRAM, I wanted to look into the cpp-family of LLaMA/Alpaca tools, but was put off by their limitation of generation delay scaling with prompt length.

TavernAI

57 1,941 9.4 JavaScript

Atmospheric adventure chat for AI language models (KoboldAI, NovelAI, Pygmalion, OpenAI chatgpt, gpt-4)

Are you using the original TavernAI or the Silly TavernAI mod? The latter seems to crash when trying to access the koboldcpp endpoint.

InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
SillyTavern

75 677 10.0 JavaScript

Discontinued LLM Frontend for Power Users. [Moved to: https://github.com/SillyTavern/SillyTavern] (by Cohee1207)

Have you tried to talk to both at the same time? With TavernAI group chats are actually possible. The current version isn't compatible with koboldcpp, but the dev version has a fix, and I'm just getting started playing around with it.

KoboldAI

58 150 8.6 Python

Unfortunately koboldcpp only runs on CPU. Perhaps you could try using this fork of koboldai with llama support? https://github.com/0cc4m/KoboldAI

RWKV-LM

84 11,619 8.8 Python

RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding.

I'm most interested in that last one. I think I heard the RWKV models are very fast, don't need much Ram, and can have huge context tokens, so maybe their 14b can work for me. I wasn't sure how ready for use they were though, but looking more into it, stuff like rwkv.cpp and ChatRWKV and a whole lot of other community projects are mentioned on their github.

rwkv.cpp

12 1,097 7.6 C++

INT4/INT5/INT8 and FP16 inference on CPU for RWKV language model

I'm most interested in that last one. I think I heard the RWKV models are very fast, don't need much Ram, and can have huge context tokens, so maybe their 14b can work for me. I wasn't sure how ready for use they were though, but looking more into it, stuff like rwkv.cpp and ChatRWKV and a whole lot of other community projects are mentioned on their github.

ChatRWKV

28 9,276 8.3 Python

ChatRWKV is like ChatGPT but powered by RWKV (100% RNN) language model, and open source.

I'm most interested in that last one. I think I heard the RWKV models are very fast, don't need much Ram, and can have huge context tokens, so maybe their 14b can work for me. I wasn't sure how ready for use they were though, but looking more into it, stuff like rwkv.cpp and ChatRWKV and a whole lot of other community projects are mentioned on their github.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project