From Deep to Long Learning

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
  • heinsen_routing

    Reference implementation of "An Algorithm for Routing Vectors in Sequences" (Heinsen, 2022) and "An Algorithm for Routing Capsules in All Domains" (Heinsen, 2019), for composing deep neural networks.

  • This looks really interesting! I'm going to take a closer look.

    It reminds me of a dynamic routing algorithm (related to self-attention) that can handle sequences with 1M+ tokens: https://github.com/glassroom/heinsen_routing . Right now, you could take 1,000 sequences of hidden states computed by a pretrained transformer, each sequence with, say, 1024 tokens, concatenate them into a single ultra-long sequence with 1,024,000 hidden states, slap 1,024,000 position encodings on top, and feed the whole thing to that routing algorithm to predict the next token.

  • block-recurrent-transformer-pytorch

    Implementation of Block Recurrent Transformer - Pytorch

  • that line of research is still going. https://github.com/lucidrains/block-recurrent-transformer-py... i think it is worth continuing research on both fronts.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • that line of research is still going. https://github.com/lucidrains/block-recurrent-transformer-py... i think it is worth continuing research on both fronts.

  • RWKV-LM

    RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding.

  • https://github.com/BlinkDL/RWKV-LM this claims to work well with long sequences.

  • iris

    Transformers are Sample-Efficient World Models. ICLR 2023, notable top 5%. (by eloialonso)

  • Yea, after all these LLMs are predicting one sequence of tokens from another sequence of tokens and the tokens could be anything, it just "happens" that text has the most knowledge and the easiest to input, then there are image, sound, video, but tokens could also be learned from world experience in RL:

    Transformers are Sample-Efficient World Models:

    https://github.com/eloialonso/iris#transformers-are-sample-e...

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts