INT4/INT5/INT8 and FP16 inference on CPU for RWKV language model
Why do you think that https://github.com/turboderp/exllama is a good alternative to rwkv.cpp
INT4/INT5/INT8 and FP16 inference on CPU for RWKV language model
Why do you think that https://github.com/turboderp/exllama is a good alternative to rwkv.cpp