Show HN: Mamba-Chat – A Chat LLM Based on State Space Models

Our great sponsors

WorkOS - The modern identity platform for B2B SaaS

InfluxDB - Power Real-Time Data Analytics at Scale

SaaSHub - Software Alternatives and Reviews

Our great sponsors

mamba-chat

4 823 7.6 Python

Mamba-Chat: A chat LLM based on the state-space model architecture 🐍
mamba

15 9,307 8.3 Python

Hey everyone! Many of you might have come across the Mamba paper a few days ago, which introduced an LLM based on a state space model architecture. The Mamba architecture is quite useful as its complexity scales subquadratically with input length and is therefore way more efficient than transformer models: https://github.com/state-spaces/mamba
I got really excited about the paper, so I decided to fine-tune the model on a chat dataset. It turns that this actually worked quite well! The model is quite suitable for casual chatting, which honestly surprised me given that it only has 2.8B parameters and the base model was only trained on the Pile. It's quite exciting that we might have a serious candidate for an architecture that could dethrone transformers.
You can find both my fine-tuning and inference code here: https://github.com/havenhq/mamba-chat

WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

What are LLMs? An intro into AI, models, tokens, parameters, weights, quantization and more
4 projects | dev.to | 28 Apr 2024
LTK is a little toolkit for writing UIs in PyScript
1 project | news.ycombinator.com | 28 Apr 2024
Block* and AgentFormer – Playing with blocks and Transformers (yay)
1 project | news.ycombinator.com | 28 Apr 2024
Understanding and avoiding visually ambiguous characters in IDs
6 projects | news.ycombinator.com | 22 Apr 2024
OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computers
1 project | news.ycombinator.com | 28 Apr 2024

Show HN: Mamba-Chat – A Chat LLM Based on State Space Models

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com Post date: 6 Dec 2023

mamba-chat

mamba

WorkOS

Related posts