[R] RWKV-v2-RNN : A parallelizable RNN with transformer-level LM performance, and without using attention

This page summarizes the projects mentioned and recommended in the original post on /r/MachineLearning

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • RWKV-LM

    RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding.

  • Simply run train.py in https://github.com/BlinkDL/RWKV-LM/tree/main/RWKV-v2-RNN :)

  • AI-Writer

    AI 写小说,生成玄幻和言情网文等等。中文预训练生成模型。采用我的 RWKV 模型,类似 GPT-2 。AI写作。RWKV for Chinese novel generation.

  • I need more FLOPS lol. On the other hand, quite some users have fine-tuned the Chinese novel model (https://github.com/BlinkDL/AI-Writer).

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • RWKV-v2-RNN-Pile

    RWKV-v2-RNN trained on the Pile. See https://github.com/BlinkDL/RWKV-LM for details.

  • Yes. You can begin with the 169M params model (in Releases of https://github.com/BlinkDL/RWKV-v2-RNN-Pile) which is not converged yet but fine for testing.

  • SmallInitEmb

    LayerNorm(SmallInit(Embedding)) in a Transformer to improve convergence

  • SmallInitEmb (https://github.com/BlinkDL/SmallInitEmb)

  • RWKV-CUDA

    The CUDA version of the RWKV language model ( https://github.com/BlinkDL/RWKV-LM )

  • It's using my custom CUDA kernel ( https://github.com/BlinkDL/RWKV-CUDA ) to speedup training, so only GPU for now. On the other hand, you don't need CUDA for inference, and it is very fast even on CPUs.

  • token-shift-gpt

    Implementation of Token Shift GPT - An autoregressive model that solely relies on shifting the sequence space for mixing

  • indeed :) took this to the extreme with https://github.com/lucidrains/token-shift-gpt

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • Show HN: FileKitty – Combine and label text files for LLM prompt contexts

    4 projects | news.ycombinator.com | 1 May 2024
  • Effortlessly Create an AI Dungeon Master Bot Using Julep and Chainlit

    1 project | dev.to | 1 May 2024
  • An Exploration of Software-defined networks in video streaming, Part Three: Performance of a streaming system over a SDN

    1 project | dev.to | 1 May 2024
  • Clasificador de imágenes con una red neuronal convolucional (CNN)

    2 projects | dev.to | 1 May 2024
  • CommaAgents, LLM AutoGenish like system for building LLM systems

    1 project | news.ycombinator.com | 1 May 2024