LLMs use a surprisingly simple mechanism to retrieve some stored knowledge

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
  • landmark-attention

    Landmark Attention: Random-Access Infinite Context Length for Transformers

  • It indeed is. An attention mechanism's key and value matrices grow linearly with context length. With PagedAttention[1], we could imagine an external service providing context. The hard part is the how, of course. We can't load our entire database in every conversation, and I suspect there are also training challenges (perhaps addressed via LandmarkAttention[2] and other mechanisms to efficiently retrieve additional key-value matrices.

    To manage 20-50 tokens/sec, must arrive within 50-20ms. Pausing the autoregressive transformer when it creates a Q vector stalls the batch, so we need a way to predict queries _ahead_ of where they'd be useful.

    [1] https://arxiv.org/abs/2309.06180

    [2] https://arxiv.org/abs/2305.16300

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts