Our great sponsors
-
heinsen_routing
Reference implementation of "An Algorithm for Routing Vectors in Sequences" (Heinsen, 2022) and "An Algorithm for Routing Capsules in All Domains" (Heinsen, 2019), for composing deep neural networks.
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
RWKV-LM
RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding.
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
This looks really interesting! I'm going to take a closer look.
It reminds me of a dynamic routing algorithm (related to self-attention) that can handle sequences with 1M+ tokens: https://github.com/glassroom/heinsen_routing . Right now, you could take 1,000 sequences of hidden states computed by a pretrained transformer, each sequence with, say, 1024 tokens, concatenate them into a single ultra-long sequence with 1,024,000 hidden states, slap 1,024,000 position encodings on top, and feed the whole thing to that routing algorithm to predict the next token.
that line of research is still going. https://github.com/lucidrains/block-recurrent-transformer-py... i think it is worth continuing research on both fronts.
that line of research is still going. https://github.com/lucidrains/block-recurrent-transformer-py... i think it is worth continuing research on both fronts.
https://github.com/BlinkDL/RWKV-LM this claims to work well with long sequences.
Yea, after all these LLMs are predicting one sequence of tokens from another sequence of tokens and the tokens could be anything, it just "happens" that text has the most knowledge and the easiest to input, then there are image, sound, video, but tokens could also be learned from world experience in RL:
Transformers are Sample-Efficient World Models:
https://github.com/eloialonso/iris#transformers-are-sample-e...