-
RWKV-LM
RWKV (pronounced RwaKuv) is an RNN with great LLM performance, which can also be directly trained like a GPT transformer (parallelizable). We are at RWKV-7 "Goose". So it's combining the best of RNN and transformer - great performance, linear time, constant space (no kv-cache), fast training, infinite ctx_len, and free sentence embedding.
Simply run train.py in https://github.com/BlinkDL/RWKV-LM/tree/main/RWKV-v2-RNN :)
-
CodeRabbit
CodeRabbit: AI Code Reviews for Developers. Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR.
-
AI-Writer
AI 写小说,生成玄幻和言情网文等等。中文预训练生成模型。采用我的 RWKV 模型,类似 GPT-2 。AI写作。RWKV for Chinese novel generation.
I need more FLOPS lol. On the other hand, quite some users have fine-tuned the Chinese novel model (https://github.com/BlinkDL/AI-Writer).
-
RWKV-v2-RNN-Pile
RWKV-v2-RNN trained on the Pile. See https://github.com/BlinkDL/RWKV-LM for details.
Yes. You can begin with the 169M params model (in Releases of https://github.com/BlinkDL/RWKV-v2-RNN-Pile) which is not converged yet but fine for testing.
-
SmallInitEmb (https://github.com/BlinkDL/SmallInitEmb)
-
It's using my custom CUDA kernel ( https://github.com/BlinkDL/RWKV-CUDA ) to speedup training, so only GPU for now. On the other hand, you don't need CUDA for inference, and it is very fast even on CPUs.
-
token-shift-gpt
Implementation of Token Shift GPT - An autoregressive model that solely relies on shifting the sequence space for mixing
indeed :) took this to the extreme with https://github.com/lucidrains/token-shift-gpt