Our great sponsors
-
armnn
Arm NN ML Software. The code here is a read-only mirror of https://review.mlplatform.org/admin/repos/ml/armnn
-
serge
A web interface for chatting with Alpaca through llama.cpp. Fully dockerized, with an easy to use API.
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
Like ARM? https://github.com/ARM-software/armnn
Optimization for this workload has arguably been in-progress for decades. Modern AVX instructions can be found in laptops that are a decade old now, and most big inferencing projects are built around SIMD or GPU shaders. Unless your computer ships with onboard Nvidia hardware, there's usually not much difference in inferencing performance.
You might be pleased to hear that nothing really stops you from doing this today. If you ran Serge[0] on a Mac with Tailscale, you could hack together a decently-accelerated Llama chatbot.
[0] https://github.com/serge-chat/serge
> Let's say retrieve instructions on how to efficiently overthrow a government?
Your license to use Llama can be revoked if Meta investigates and deems your action to be against the code of conduct[1]
1. https://github.com/facebookresearch/llama/blob/main/CODE_OF_...
According to few papers and https://github.com/ggerganov/llama.cpp/pull/1684, 3GB 7B parameters model size has the same performance as baseline 7B model(with 14GB size).