Our great sponsors
-
TavernAI
Discontinued TavernAI for nerds [Moved to: https://github.com/Cohee1207/SillyTavern] (by SillyLossy)
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
TavernAI
Atmospheric adventure chat for AI language models (KoboldAI, NovelAI, Pygmalion, OpenAI chatgpt, gpt-4)
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
SillyTavern
Discontinued LLM Frontend for Power Users. [Moved to: https://github.com/SillyTavern/SillyTavern] (by Cohee1207)
-
RWKV-LM
RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding.
It's possible you have a very old CPU. Can you try the noavx2 build? https://github.com/LostRuins/koboldcpp/releases/download/v1.1/koboldcpp_noavx2.exe
Have you tried to talk to both at the same time? With TavernAI group chats are actually possible. The current version isn't compatible with koboldcpp, but the dev version has a fix, and I'm just getting started playing around with it.
All versions of ggml ALPACA models (legacy format from alpaca.cpp, and also all the newer ggml alpacas on huggingface)
GPT-J/JT models (legacy f16 formats here as well as 4 bit quantized ones like this and pygmalion see pyg.cpp)
And GPT4ALL without conversion required
Hey, that's a very cool project (again!). Having only 8 GB VRAM, I wanted to look into the cpp-family of LLaMA/Alpaca tools, but was put off by their limitation of generation delay scaling with prompt length.
Are you using the original TavernAI or the Silly TavernAI mod? The latter seems to crash when trying to access the koboldcpp endpoint.
Have you tried to talk to both at the same time? With TavernAI group chats are actually possible. The current version isn't compatible with koboldcpp, but the dev version has a fix, and I'm just getting started playing around with it.
Unfortunately koboldcpp only runs on CPU. Perhaps you could try using this fork of koboldai with llama support? https://github.com/0cc4m/KoboldAI
I'm most interested in that last one. I think I heard the RWKV models are very fast, don't need much Ram, and can have huge context tokens, so maybe their 14b can work for me. I wasn't sure how ready for use they were though, but looking more into it, stuff like rwkv.cpp and ChatRWKV and a whole lot of other community projects are mentioned on their github.
I'm most interested in that last one. I think I heard the RWKV models are very fast, don't need much Ram, and can have huge context tokens, so maybe their 14b can work for me. I wasn't sure how ready for use they were though, but looking more into it, stuff like rwkv.cpp and ChatRWKV and a whole lot of other community projects are mentioned on their github.
I'm most interested in that last one. I think I heard the RWKV models are very fast, don't need much Ram, and can have huge context tokens, so maybe their 14b can work for me. I wasn't sure how ready for use they were though, but looking more into it, stuff like rwkv.cpp and ChatRWKV and a whole lot of other community projects are mentioned on their github.
Related posts
- People who've used RWKV, whats your wishlist for it?
- New model: RWKV-4-Raven-7B-v12-Eng49%-Chn49%-Jpn1%-Other1%-20230530-ctx8192.pth
- I created a simple implementation of the RWKV language model (RWKV competes with the dominant Transformers-based approach which is the "T" in GPT)
- Prompt Engineering Guide
- Resources to deepen LLMs understanding for software engineers