Our great sponsors
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
16gb fits 30b q3_k_s. Maybe try making it work with an IQ script and have it run overnight! https://github.com/kroll-software/babyagi4all
Kobold uses llama.cpp under the hood if I remember correctly. That means you need to set the compiler flags for the hardware accelerator you want to use. There are unfortunately a bunch of options for that on arm platforms. I found a good overview here https://github.com/ggerganov/whisper.cpp/issues/7. Whisper.cpp is for running speech to text models but it is made by the same author as llama.cpp and all the compiler flags I found are identical so it might be worth a shot.
Vulkan is a low level API that theoretically have very good performance. You can use mlc-llm, to run LLMs on Vulkan enabled GPUs. Unfortunately is the documentation and driver support from Rochchip spotty at best.
The RK3588 also has a NPU for accelerating neural networks. The bad news is the API is not supported by any of the inference engines (afaik), but the NPU can run models directly that have been converted to the RKNN format. It is a long shot, but you can find details here.