rwkv.cpp
more-ane-transformers
rwkv.cpp | more-ane-transformers | |
---|---|---|
12 | 4 | |
1,113 | 42 | |
2.8% | - | |
6.8 | 7.0 | |
about 1 month ago | 6 months ago | |
C++ | Python | |
MIT License | - |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
rwkv.cpp
-
Eagle 7B: Soaring past Transformers
There's https://github.com/saharNooby/rwkv.cpp, which related-ish[0] to ggml/llama.cpp
[0]: https://github.com/ggerganov/llama.cpp/issues/846
- People who've used RWKV, whats your wishlist for it?
-
The Eleuther AI Mafia
Quantisation thankfully is applicable to RWKV as much as transformers. Most notably in our RWKV.cpp community project: https://github.com/saharNooby/rwkv.cpp
Tooling/Ecosystem is something that I am actively working on as there is still a gap to transformers level of tooling. But i'm glad that there is a noticeable difference!
And yes! experiments are important, to ensure improvements in the architecture. Even if "Linear Transformers" replaces "Transformers". Alternatives should always be explored, to learn from such trade-offs to the benefit of the ecosystem
(This was lightly covered in the podcast, where I share IMO that we should have more research into text based diffusion networks)
- Tiny models for contextually coherent conversations?
-
New model: RWKV-4-Raven-7B-v12-Eng49%-Chn49%-Jpn1%-Other1%-20230530-ctx8192.pth
Q8_0 models: only for https://github.com/saharNooby/rwkv.cpp (fast CPU).
- [R] RWKV: Reinventing RNNs for the Transformer Era
-
4096 Context length (and beyond)
There's https://github.com/saharNooby/rwkv.cpp which seems to work, and might be compatible with text-generation-webui.
-
The Coming of Local LLMs
Also worth checking out https://github.com/saharNooby/rwkv.cpp which is based on Georgi's library and offers support for the RWKV family of models which are Apache-2.0 licensed.
-
KoboldCpp - Combining all the various ggml.cpp CPU LLM inference projects with a WebUI and API (formerly llamacpp-for-kobold)
I'm most interested in that last one. I think I heard the RWKV models are very fast, don't need much Ram, and can have huge context tokens, so maybe their 14b can work for me. I wasn't sure how ready for use they were though, but looking more into it, stuff like rwkv.cpp and ChatRWKV and a whole lot of other community projects are mentioned on their github.
- rwkv.cpp: FP16 & INT4 inference on CPU for RWKV language model (r/MachineLearning)
more-ane-transformers
- M2 Ultra can run 128 streams of Llama 2 7B in parallel
- Is it possible to use ANE(Apple Neural Engine) to run those models?
-
The Coming of Local LLMs
Apple should get working on a version of the Neural Engine that is useful for these models, and remove the 3GB size limit [1] to take full advantage of the 'unified' memory architecture. Game changer.
Waste of die space currently
[1] https://github.com/smpanaro/more-ane-transformers/blob/main/...
- Anthropic’s $5B, 4-year plan to take on OpenAI
What are some alternatives?
llama.cpp - LLM inference in C/C++
pyllms - Minimal Python library to connect to LLMs (OpenAI, Anthropic, AI21, Cohere, Aleph Alpha, HuggingfaceHub, Google PaLM2, with a built-in model performance benchmark.
RWKV-LM - RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding.
neural-engine - Everything we actually know about the Apple Neural Engine (ANE)
ChatRWKV - ChatRWKV is like ChatGPT but powered by RWKV (100% RNN) language model, and open source.
whisper.coreml - Robust Speech Recognition via Large-Scale Weak Supervision
mpt-30B-inference - Run inference on MPT-30B using CPU
tinygrad - You like pytorch? You like micrograd? You love tinygrad! ❤️ [Moved to: https://github.com/tinygrad/tinygrad]
verbaflow - Neural Language Model for Go
duckduckgo-locales - Translation files for <a href="https://duckduckgo.com"> </a>
alpaca.cpp - Locally run an Instruction-Tuned Chat-Style LLM
experiments-coreml-ane-distilbert - Experimenting with https://github.com/apple/ml-ane-transformers