Transformer Attention is off by one

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • flaxformer

  • >why did no one come up with this before

    So it turns out someone did. Specifically google did. This exact same idea has been in flaxformers since at least November 2021.

    https://github.com/google/flaxformer/blame/ee62754ebe5a5eeb1...

    Specifically to save people a click it says:

    > """Softmax function with an additional virtual logit equal to zero.

      For compatibility with some previously trained models.

  • nanoGPT

    The simplest, fastest repository for training/finetuning medium-sized GPTs.

  • https://github.com/karpathy/nanoGPT/blob/f08abb45bd2285627d1...

    At training time, probabilities for the next token are computed for each position, so if we feed in a sequence of n tokens, we basically get n training examples, one for each position, but at inference time, we only compute the next token since we’ve already output the preceding ones.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts