KoboldCpp - Combining all the various ggml.cpp CPU LLM inference projects with a WebUI and API (formerly llamacpp-for-kobold)

This page summarizes the projects mentioned and recommended in the original post on /r/LocalLLaMA

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
  • koboldcpp

    A simple one-file way to run various GGML and GGUF models with KoboldAI's UI

  • It's possible you have a very old CPU. Can you try the noavx2 build? https://github.com/LostRuins/koboldcpp/releases/download/v1.1/koboldcpp_noavx2.exe

  • TavernAI

    Discontinued TavernAI for nerds [Moved to: https://github.com/Cohee1207/SillyTavern] (by SillyLossy)

  • Have you tried to talk to both at the same time? With TavernAI group chats are actually possible. The current version isn't compatible with koboldcpp, but the dev version has a fix, and I'm just getting started playing around with it.

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • alpaca.cpp

    Discontinued Locally run an Instruction-Tuned Chat-Style LLM

  • All versions of ggml ALPACA models (legacy format from alpaca.cpp, and also all the newer ggml alpacas on huggingface)

  • pygmalion.cpp

    Discontinued C/C++ implementation of PygmalionAI/pygmalion-6b

  • GPT-J/JT models (legacy f16 formats here as well as 4 bit quantized ones like this and pygmalion see pyg.cpp)

  • gpt4all

    gpt4all: run open-source LLMs anywhere

  • And GPT4ALL without conversion required

  • llama.cpp

    LLM inference in C/C++

  • Hey, that's a very cool project (again!). Having only 8 GB VRAM, I wanted to look into the cpp-family of LLaMA/Alpaca tools, but was put off by their limitation of generation delay scaling with prompt length.

  • TavernAI

    Atmospheric adventure chat for AI language models (KoboldAI, NovelAI, Pygmalion, OpenAI chatgpt, gpt-4)

  • Are you using the original TavernAI or the Silly TavernAI mod? The latter seems to crash when trying to access the koboldcpp endpoint.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • SillyTavern

    Discontinued LLM Frontend for Power Users. [Moved to: https://github.com/SillyTavern/SillyTavern] (by Cohee1207)

  • Have you tried to talk to both at the same time? With TavernAI group chats are actually possible. The current version isn't compatible with koboldcpp, but the dev version has a fix, and I'm just getting started playing around with it.

  • KoboldAI

  • Unfortunately koboldcpp only runs on CPU. Perhaps you could try using this fork of koboldai with llama support? https://github.com/0cc4m/KoboldAI

  • RWKV-LM

    RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding.

  • I'm most interested in that last one. I think I heard the RWKV models are very fast, don't need much Ram, and can have huge context tokens, so maybe their 14b can work for me. I wasn't sure how ready for use they were though, but looking more into it, stuff like rwkv.cpp and ChatRWKV and a whole lot of other community projects are mentioned on their github.

  • rwkv.cpp

    INT4/INT5/INT8 and FP16 inference on CPU for RWKV language model

  • I'm most interested in that last one. I think I heard the RWKV models are very fast, don't need much Ram, and can have huge context tokens, so maybe their 14b can work for me. I wasn't sure how ready for use they were though, but looking more into it, stuff like rwkv.cpp and ChatRWKV and a whole lot of other community projects are mentioned on their github.

  • ChatRWKV

    ChatRWKV is like ChatGPT but powered by RWKV (100% RNN) language model, and open source.

  • I'm most interested in that last one. I think I heard the RWKV models are very fast, don't need much Ram, and can have huge context tokens, so maybe their 14b can work for me. I wasn't sure how ready for use they were though, but looking more into it, stuff like rwkv.cpp and ChatRWKV and a whole lot of other community projects are mentioned on their github.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts