[R] RWKV-v2-RNN : A parallelizable RNN with transformer-level LM performance, and without using attention

This page summarizes the projects mentioned and recommended in the original post on /r/MachineLearning

CodeRabbit: AI Code Reviews for Developers
Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR.
coderabbit.ai
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  1. RWKV-LM

    RWKV (pronounced RwaKuv) is an RNN with great LLM performance, which can also be directly trained like a GPT transformer (parallelizable). We are at RWKV-7 "Goose". So it's combining the best of RNN and transformer - great performance, linear time, constant space (no kv-cache), fast training, infinite ctx_len, and free sentence embedding.

    Simply run train.py in https://github.com/BlinkDL/RWKV-LM/tree/main/RWKV-v2-RNN :)

  2. CodeRabbit

    CodeRabbit: AI Code Reviews for Developers. Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR.

    CodeRabbit logo
  3. AI-Writer

    AI 写小说,生成玄幻和言情网文等等。中文预训练生成模型。采用我的 RWKV 模型,类似 GPT-2 。AI写作。RWKV for Chinese novel generation.

    I need more FLOPS lol. On the other hand, quite some users have fine-tuned the Chinese novel model (https://github.com/BlinkDL/AI-Writer).

  4. RWKV-v2-RNN-Pile

    RWKV-v2-RNN trained on the Pile. See https://github.com/BlinkDL/RWKV-LM for details.

    Yes. You can begin with the 169M params model (in Releases of https://github.com/BlinkDL/RWKV-v2-RNN-Pile) which is not converged yet but fine for testing.

  5. SmallInitEmb

    LayerNorm(SmallInit(Embedding)) in a Transformer to improve convergence

    SmallInitEmb (https://github.com/BlinkDL/SmallInitEmb)

  6. RWKV-CUDA

    The CUDA version of the RWKV language model ( https://github.com/BlinkDL/RWKV-LM )

    It's using my custom CUDA kernel ( https://github.com/BlinkDL/RWKV-CUDA ) to speedup training, so only GPU for now. On the other hand, you don't need CUDA for inference, and it is very fast even on CPUs.

  7. token-shift-gpt

    Implementation of Token Shift GPT - An autoregressive model that solely relies on shifting the sequence space for mixing

    indeed :) took this to the extreme with https://github.com/lucidrains/token-shift-gpt

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • Anchorpoint – Version Control for Artists

    1 project | news.ycombinator.com | 18 Mar 2025
  • Cloud Exit Assessment

    1 project | news.ycombinator.com | 18 Mar 2025
  • Mataroa: A blogging platform, for minimalists. Just write

    1 project | news.ycombinator.com | 18 Mar 2025
  • Common Use Cases for CAMEL-AI

    1 project | dev.to | 18 Mar 2025
  • CAMEL-AI vs. Other AI Frameworks: What Sets It Apart?

    1 project | dev.to | 18 Mar 2025

Did you know that Python is
the 2nd most popular programming language
based on number of references?