Our great sponsors
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
On a side note: working over longer contexts also reminds me of MemGPT(https://github.com/cpacker/MemGPT)
llama.cpp probably won't be getting Jamba support anytime soon: https://github.com/ggerganov/llama.cpp/issues/6372#issuecomm...
There is an MLX Mamba implementation, but nothing for Jamba either: https://github.com/alxndrTL/mamba.py/tree/main/mlx
You could run PyTorch on CPU and w/ a 12B activation pass, it might even run relatively fast (8 tok/s?), but a q4 quant would also easily fit on 2x3090s and should run at >60 tok/s.
llama.cpp probably won't be getting Jamba support anytime soon: https://github.com/ggerganov/llama.cpp/issues/6372#issuecomm...
There is an MLX Mamba implementation, but nothing for Jamba either: https://github.com/alxndrTL/mamba.py/tree/main/mlx
You could run PyTorch on CPU and w/ a 12B activation pass, it might even run relatively fast (8 tok/s?), but a q4 quant would also easily fit on 2x3090s and should run at >60 tok/s.