rwkv.cpp
llama.cpp
rwkv.cpp | llama.cpp | |
---|---|---|
12 | 782 | |
1,113 | 58,425 | |
2.8% | - | |
6.8 | 10.0 | |
about 1 month ago | 4 days ago | |
C++ | C++ | |
MIT License | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
rwkv.cpp
-
Eagle 7B: Soaring past Transformers
There's https://github.com/saharNooby/rwkv.cpp, which related-ish[0] to ggml/llama.cpp
[0]: https://github.com/ggerganov/llama.cpp/issues/846
- People who've used RWKV, whats your wishlist for it?
-
The Eleuther AI Mafia
Quantisation thankfully is applicable to RWKV as much as transformers. Most notably in our RWKV.cpp community project: https://github.com/saharNooby/rwkv.cpp
Tooling/Ecosystem is something that I am actively working on as there is still a gap to transformers level of tooling. But i'm glad that there is a noticeable difference!
And yes! experiments are important, to ensure improvements in the architecture. Even if "Linear Transformers" replaces "Transformers". Alternatives should always be explored, to learn from such trade-offs to the benefit of the ecosystem
(This was lightly covered in the podcast, where I share IMO that we should have more research into text based diffusion networks)
- Tiny models for contextually coherent conversations?
-
New model: RWKV-4-Raven-7B-v12-Eng49%-Chn49%-Jpn1%-Other1%-20230530-ctx8192.pth
Q8_0 models: only for https://github.com/saharNooby/rwkv.cpp (fast CPU).
- [R] RWKV: Reinventing RNNs for the Transformer Era
-
4096 Context length (and beyond)
There's https://github.com/saharNooby/rwkv.cpp which seems to work, and might be compatible with text-generation-webui.
-
The Coming of Local LLMs
Also worth checking out https://github.com/saharNooby/rwkv.cpp which is based on Georgi's library and offers support for the RWKV family of models which are Apache-2.0 licensed.
-
KoboldCpp - Combining all the various ggml.cpp CPU LLM inference projects with a WebUI and API (formerly llamacpp-for-kobold)
I'm most interested in that last one. I think I heard the RWKV models are very fast, don't need much Ram, and can have huge context tokens, so maybe their 14b can work for me. I wasn't sure how ready for use they were though, but looking more into it, stuff like rwkv.cpp and ChatRWKV and a whole lot of other community projects are mentioned on their github.
- rwkv.cpp: FP16 & INT4 inference on CPU for RWKV language model (r/MachineLearning)
llama.cpp
-
IBM Granite: A Family of Open Foundation Models for Code Intelligence
if you can compile stuff, then looking at llama.cpp (what ollama uses) is also interesting: https://github.com/ggerganov/llama.cpp
the server is here: https://github.com/ggerganov/llama.cpp/tree/master/examples/...
And you can search for any GGUF on huggingface
-
Ask HN: Affordable hardware for running local large language models?
Yes, Metal seems to allow a maximum of 1/2 of the RAM for one process, and 3/4 of the RAM allocated to the GPU overall. There’s a kernel hack to fix it, but that comes with the usual system integrity caveats. https://github.com/ggerganov/llama.cpp/discussions/2182
- Xmake: A modern C/C++ build tool
-
Better and Faster Large Language Models via Multi-Token Prediction
For anyone interested in exploring this, llama.cpp has an example implementation here:
https://github.com/ggerganov/llama.cpp/tree/master/examples/...
- Llama.cpp Bfloat16 Support
-
Fine-tune your first large language model (LLM) with LoRA, llama.cpp, and KitOps in 5 easy steps
Getting started with LLMs can be intimidating. In this tutorial we will show you how to fine-tune a large language model using LoRA, facilitated by tools like llama.cpp and KitOps.
- GGML Flash Attention support merged into llama.cpp
-
Phi-3 Weights Released
well https://github.com/ggerganov/llama.cpp/issues/6849
- Lossless Acceleration of LLM via Adaptive N-Gram Parallel Decoding
- Llama.cpp Working on Support for Llama3
What are some alternatives?
RWKV-LM - RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding.
ollama - Get up and running with Llama 3, Mistral, Gemma, and other large language models.
ChatRWKV - ChatRWKV is like ChatGPT but powered by RWKV (100% RNN) language model, and open source.
gpt4all - gpt4all: run open-source LLMs anywhere
mpt-30B-inference - Run inference on MPT-30B using CPU
text-generation-webui - A Gradio web UI for Large Language Models. Supports transformers, GPTQ, AWQ, EXL2, llama.cpp (GGUF), Llama models.
verbaflow - Neural Language Model for Go
GPTQ-for-LLaMa - 4 bits quantization of LLaMA using GPTQ
alpaca.cpp - Locally run an Instruction-Tuned Chat-Style LLM
ggml - Tensor library for machine learning
cformers - SoTA Transformers with C-backend for fast inference on your CPU.