Show HN: Mamba-Chat – A Chat LLM Based on State Space Models

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
  • mamba-chat

    Mamba-Chat: A chat LLM based on the state-space model architecture 🐍

  • mamba

  • Hey everyone! Many of you might have come across the Mamba paper a few days ago, which introduced an LLM based on a state space model architecture. The Mamba architecture is quite useful as its complexity scales subquadratically with input length and is therefore way more efficient than transformer models: https://github.com/state-spaces/mamba

    I got really excited about the paper, so I decided to fine-tune the model on a chat dataset. It turns that this actually worked quite well! The model is quite suitable for casual chatting, which honestly surprised me given that it only has 2.8B parameters and the base model was only trained on the Pile. It's quite exciting that we might have a serious candidate for an architecture that could dethrone transformers.

    You can find both my fine-tuning and inference code here: https://github.com/havenhq/mamba-chat

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts