Unlimiformer: Long-Range Transformers with Unlimited Length Input

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
  • safari

    Convolutions for Sequence Modeling

  • After a very quick read, that's my understanding too: It's just KNN search. So I agree on points 1-3. When something works well, I don't care much about point 4.

    I've had only mixed success with KNN search. Maybe I haven't done it right? Nothing seems to work quite as well for me as explicit token-token interactions by some form of attention, which as we all know is too costly for long sequences (O(n²)). Lately I've been playing with https://github.com/hazyresearch/safari , which uses a lot less compute and seems promising. Otherwise, for long sequences I've yet to find something better than https://github.com/HazyResearch/flash-attention for n×n interactions and https://github.com/glassroom/heinsen_routing for n×m interactions. If anyone here has other suggestions, I'd love to hear about them.

  • flash-attention

    Fast and memory-efficient exact attention

  • After a very quick read, that's my understanding too: It's just KNN search. So I agree on points 1-3. When something works well, I don't care much about point 4.

    I've had only mixed success with KNN search. Maybe I haven't done it right? Nothing seems to work quite as well for me as explicit token-token interactions by some form of attention, which as we all know is too costly for long sequences (O(n²)). Lately I've been playing with https://github.com/hazyresearch/safari , which uses a lot less compute and seems promising. Otherwise, for long sequences I've yet to find something better than https://github.com/HazyResearch/flash-attention for n×n interactions and https://github.com/glassroom/heinsen_routing for n×m interactions. If anyone here has other suggestions, I'd love to hear about them.

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • heinsen_routing

    Reference implementation of "An Algorithm for Routing Vectors in Sequences" (Heinsen, 2022) and "An Algorithm for Routing Capsules in All Domains" (Heinsen, 2019), for composing deep neural networks.

  • After a very quick read, that's my understanding too: It's just KNN search. So I agree on points 1-3. When something works well, I don't care much about point 4.

    I've had only mixed success with KNN search. Maybe I haven't done it right? Nothing seems to work quite as well for me as explicit token-token interactions by some form of attention, which as we all know is too costly for long sequences (O(n²)). Lately I've been playing with https://github.com/hazyresearch/safari , which uses a lot less compute and seems promising. Otherwise, for long sequences I've yet to find something better than https://github.com/HazyResearch/flash-attention for n×n interactions and https://github.com/glassroom/heinsen_routing for n×m interactions. If anyone here has other suggestions, I'd love to hear about them.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts