Our great sponsors
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
Hey everyone! Many of you might have come across the Mamba paper a few days ago, which introduced an LLM based on a state space model architecture. The Mamba architecture is quite useful as its complexity scales subquadratically with input length and is therefore way more efficient than transformer models: https://github.com/state-spaces/mamba
I got really excited about the paper, so I decided to fine-tune the model on a chat dataset. It turns that this actually worked quite well! The model is quite suitable for casual chatting, which honestly surprised me given that it only has 2.8B parameters and the base model was only trained on the Pile. It's quite exciting that we might have a serious candidate for an architecture that could dethrone transformers.
You can find both my fine-tuning and inference code here: https://github.com/havenhq/mamba-chat
Related posts
- What are LLMs? An intro into AI, models, tokens, parameters, weights, quantization and more
- LTK is a little toolkit for writing UIs in PyScript
- Block* and AgentFormer – Playing with blocks and Transformers (yay)
- Understanding and avoiding visually ambiguous characters in IDs
- OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computers