[R] RWKV-v2-RNN : A parallelizable RNN with transformer-level LM performance, and without using attention

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

RWKV-LM

84 11,657 8.8 Python

RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding.

Simply run train.py in https://github.com/BlinkDL/RWKV-LM/tree/main/RWKV-v2-RNN :)

AI-Writer

2 2,724 3.4 Python

AI 写小说，生成玄幻和言情网文等等。中文预训练生成模型。采用我的 RWKV 模型，类似 GPT-2 。AI写作。RWKV for Chinese novel generation.

I need more FLOPS lol. On the other hand, quite some users have fine-tuned the Chinese novel model (https://github.com/BlinkDL/AI-Writer).

InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
RWKV-v2-RNN-Pile

6 65 0.0 Python

RWKV-v2-RNN trained on the Pile. See https://github.com/BlinkDL/RWKV-LM for details.

Yes. You can begin with the 169M params model (in Releases of https://github.com/BlinkDL/RWKV-v2-RNN-Pile) which is not converged yet but fine for testing.

SmallInitEmb

1 41 3.2 Python

LayerNorm(SmallInit(Embedding)) in a Transformer to improve convergence

SmallInitEmb (https://github.com/BlinkDL/SmallInitEmb)

RWKV-CUDA

3 188 8.5 Cuda

The CUDA version of the RWKV language model ( https://github.com/BlinkDL/RWKV-LM )

It's using my custom CUDA kernel ( https://github.com/BlinkDL/RWKV-CUDA ) to speedup training, so only GPU for now. On the other hand, you don't need CUDA for inference, and it is very fast even on CPUs.

token-shift-gpt

1 47 0.0 Python

Implementation of Token Shift GPT - An autoregressive model that solely relies on shifting the sequence space for mixing

indeed :) took this to the extreme with https://github.com/lucidrains/token-shift-gpt

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Show HN: FileKitty – Combine and label text files for LLM prompt contexts

4 projects | news.ycombinator.com | 1 May 2024
Effortlessly Create an AI Dungeon Master Bot Using Julep and Chainlit

1 project | dev.to | 1 May 2024
An Exploration of Software-defined networks in video streaming, Part Three: Performance of a streaming system over a SDN

1 project | dev.to | 1 May 2024
Clasificador de imágenes con una red neuronal convolucional (CNN)

2 projects | dev.to | 1 May 2024
CommaAgents, LLM AutoGenish like system for building LLM systems

1 project | news.ycombinator.com | 1 May 2024

[R] RWKV-v2-RNN : A parallelizable RNN with transformer-level LM performance, and without using attention

This page summarizes the projects mentioned and recommended in the original post on /r/MachineLearning Post date: 10 May 2022

RWKV-LM

AI-Writer

InfluxDB

RWKV-v2-RNN-Pile

SmallInitEmb

RWKV-CUDA

token-shift-gpt

Related posts

Show HN: FileKitty – Combine and label text files for LLM prompt contexts

Effortlessly Create an AI Dungeon Master Bot Using Julep and Chainlit

An Exploration of Software-defined networks in video streaming, Part Three: Performance of a streaming system over a SDN

Clasificador de imágenes con una red neuronal convolucional (CNN)

CommaAgents, LLM AutoGenish like system for building LLM systems