RWKV infctx trainer, for training arbitary context sizes, to 10k and beyond!
Why do you think that https://github.com/BlinkDL/RWKV-CUDA is a good alternative to RWKV-infctx-trainer
RWKV infctx trainer, for training arbitary context sizes, to 10k and beyond!
Why do you think that https://github.com/BlinkDL/RWKV-CUDA is a good alternative to RWKV-infctx-trainer