[D] Transformer sequence generation - is it truly quadratic scaling?

This page summarizes the projects mentioned and recommended in the original post on /r/MachineLearning

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • x-transformers

    A simple but complete full-attention transformer with a set of promising experimental features from various papers

  • However, I've come across the concept of Key, Value Caching in Transformer-Decoders recently (e.g. Figure 3 here), wherein because each output (and hence each input, since the model is autoregressive) only depends on previous outputs (inputs), we don't need to re-compute Key and Value vectors for all t < t_i at timestep i of the sequence. My intuition leads me to believe, then, that (unconditioned) inference for a decoder-only model uses an effective sequence length of 1 (the most recently produced token is the only real input that requires computation on), making Attention a linear-complexity operation. This thinking seems to be validated by this github issue, and this paper (2nd paragraph of Introduction).

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • x-transformers

    1 project | news.ycombinator.com | 31 Mar 2024
  • A single API call using almost the whole 32k context window costs around 2$.

    1 project | /r/OpenAI | 15 Mar 2023
  • GPT-4 architecture: what we can deduce from research literature

    1 project | news.ycombinator.com | 14 Mar 2023
  • You’ll be able to run chatgpt on your own device quite easily very soon

    2 projects | /r/OpenAI | 13 Mar 2023
  • The GPT Architecture, on a Napkin

    4 projects | news.ycombinator.com | 11 Dec 2022