Run inference on MPT-30B using CPU
Why do you think that https://github.com/turboderp/exllama is a good alternative to mpt-30B-inference
Run inference on MPT-30B using CPU
Why do you think that https://github.com/turboderp/exllama is a good alternative to mpt-30B-inference